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THE ROLE OF STATISTICS IN’) PHYSIOLOGICAL 
RESEARCH* 


C. W. EMMENsS 
Department of Veterinary Physiology, 
University of Sydney, 
Sydney, N.S.W., Australia. 


Physiologists were introduced to biometry in its modern form in 
the guise of bioassay. They have been accustomed for a long time to 
more classical forms of biometry, such as the measurement of responses 
and their treatment by various mathematical manipulations, in par- 
ticular in attempts at curve fitting. However, the statistical evaluation 
of results, or curve-fitting by such methods as orthogonal polynomials, 
or better still, the planning of experiments along statistical lines is not 
frequently seen. Thus, in a six months run of the Journal of Physiology 
for 1958, there were 77 papers, many of which included quite an amount 
of mathematies. Of these 77 papers only 16 had any statistical content, 
nearly always of very elementary form, standard errors or an occasional 
t-test. Only 3 had more than this—one an analysis of variance, two 
others calculations of correlation or regression. A transfer from bioassay 
to general research has thus not oecurred and the penetration of statis- 
tics into physiology has only just begun. 

“xcept in a few specialised fields such as genetics or toxicology, it 
seems often to be assumed, by those who think about it at all, that 
animals are not very adaptable to statistical handling. It is expected 
that the results of experiments of similar structure to those seen in 
agriculture would be at least as complex to handle, with frequent 
interactions and sometimes with difficulty in interpretation. It has 
been a pleasant surprise to find that this is often not the case, and our 
impression is that factorial designs, Latin squares and other partially 
replicated designs less often show significant interaction in physiology 
than in agriculture or, say, industrial chemistry. As a somewhat 
extreme example, we have shown in my own department that highly 
informative results were obtained from a 1/16th replicate of a 2” 
factorial experiment involving 256 breeding pairs of laboratory mice. 


*The substance of a Presidential Address delivered at the Biennial Meeting of the Australasian 
Region of the Biometric Society in Melbourne, August, 1950. 
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There were highly significant main effects and only one highly significant 
first order interaction. 


Types of enquiry 

Despite the multitudinous forms of physiological research, much 
experimental work can usefully be considered as falling into one or 
more of four main classes. These depend on whether comparisons are 
made between or within animals, or preparations from them; and on 
whether observations are quantal or graded. Within-animal com- 
parisons are common in physiological work: the action of drugs in an 
organ bath is the example that comes most readily to mind. A strip 
of gut is attached to a recording device and its contractions under the 
influence of various drugs or drug concentrations are measured. Such 
“assays” are not usually submitted to statistical examination, however, 
despite the ease with which this is possible. If within-animal com- 
parisons are not feasible, as when the animal must be killed for the 
examination of an organ, within-litter comparisons may be substituted. 
Otherwise, as in dietary tests or tests of the action of many hormones, 
groups of unrelated animals, although usually of similar origin, are 
frequently employed for between-animal comparisons. 

Graded responses are usually to be preferred to quantal responses, 
with semi-quantal responses falling between. A graded response gives 
more information than a quantal response, and covers a wider range of 
treatment effect, so that there is the additional safeguard that the 
doses given are more likely to fall in the range of useful observation 


TABLE 1 


EXAMPLES FROM BIOASSAY OF THE GAIN FROM Usinc WiruiN-ANIMAL OR 
Wirtutn-LItrER COMPARISONS AND OF USING A GRADED RATHER 
THAN QUANTAL RESPONSE 


Comparable 
Type of assay, limits of error 
| (P = 0.99) 

Chorionic gonadotrophin (graded, within-litters) 80-125% 
(graded, random animals) 65-155% 
Serum gonadotrophin (graded, within-litters ) 90-111% 
re re (graded, random animals) 80-125% 
(rastsagenle hormone (quantal, random animals) 78-128% 
(quantal, within-animal ) 88- 115% 
(graded, random animals) 89-1140; 


| 
| 
* 
a 
| 
> “ 


STATISTICS IN PHYSIOLOGICAL RESEARCH 163 


when response varies strongly with dose. At worst, a graded response 
gives about twice the information that is given by a quantal response 
(ef. the weighting factor of Gaddum, [1933]), often because of its wider 
range of useful observation it provides much more. It is thus worth 
while exercising ingenuity to measure on a continuous scale, such as 
converting from the quantal observation of the death or survival of 
an animal to the graded observation of how long it survives (Emmens, 
{1948]). 

Some examples of these principles are shown in Table 1, where, in 
various biological assays, it is found that the conversion from between- 
animal, or between-litter, observations is well worth while, and that a 
conversion from a between-animal quantal to a between-animal graded 
observation achieved a similar improvement in accuracy. It would 
have been very interesting to convert from between-animal quantal 
measurements to within-animal graded measurements in the latter 
example, but this was unfortunately not possible. There are naturally 
some fields in which it is not appropriate to use within-animal com- 
parisons, and where measurement of the characteristics of a population 


is important rather than a striving for accuracy in comparing responses 
to treatment. 
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FIGURE 1 


DIAGRAMMATIC REPRESENTATION OF THE TREATMENTS IN EXPERIMENT 1. 


The abscissae indicate the time over which the injections were made. The ordinates 
represent the proportion of the total dose given at each injection. Three dose levels of 
gonadotrophin were given for each of the treatment combinations: 4, 12 and 36 ug. 
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Graded responses 
(2) Between animals (Experiment 1) 


The response of the uterus of the immature mouse to gonadotrophic 
stimulation provides a simple example of factorial design in physiological 
research. Under the stimulation of the pituitary hormones, the ovaries 
of the mouse produce their own hormones prematurely and these in 
turn cause uterine enlargement. The weight of the uteri of control and 
treated animals may he used to demonstrate the action of the hormones. 
It was desired to estimate the optimal conditions for assaying pituitary 
extracts for potency of this type (Claringbold and Lamond, [1957)). 
From what was already known, it was decided to give 3 injections, 
spaced over 72, 48 or 24 hours, with each total dose partitioned so as 
to give increasing, equal or decreasing increments. ‘Additionally, 3 
dosage levels were given, 4, 12 or 36 ug. of a particular preparation. 
All mice were killed and they and their uteri weighed 24 hr. after the 
last injection. The test was thus a 3° factorial, with 4 mice per treat- 
ment combination, a total of 108 mice. The scheme is shown in Figure 1, 
and the analysis of variance of the results in Table 2. 

Owing to dependence of variance on response, responses were trans- 
formed to logs. Significant dependence of response on body weight is 


TABLE 2 
ANALYSIS OF VARIANCE OF THE RESULTS OF EXPERIMENT I. 


(The interactions listed as ‘‘remainder’’ were tested separately and, on being found not 
significant, were pooled to save space. The d.f. of the error mean square are reduced by 
four owing to three missing values and one d.f. for covariance correction.) 


| 


| 
Source of variation | df. | Mean square 
Time base | (2) 
Linear 1 2532.3*** 
Quadratic | 1 176.0 
Partition (2) | 
Linear (?’;,) 93.4 
Quadratic 1 | 1768 .2** 
Levels (2) | 
Quad; «tie 1 | 210.0 
Interactious (20) | 
| J 1575 .5* 
Remainder | 14 134.9 
Error mean square 77 236.5 


< 0.001, O01, < 0.05. 
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usual in such tests, so that covarianee corrections were applied. 
will be seen from ‘Table 2 that clear-cut and highly signifieant main 
effeets oceurred. Only one formally signifieant interaction was seen 
at the 5° level, which among a total of 20 was clearly negligible. The 
most sensitive assay method was that with the shortest time base and 
equal injections and this was adopted. All dose-response lines were 
linear in the chosen dose ranges. 


(b) Within animals (Experiment 2) 


The use of within-animal responses depends on the opportunity to 
use the same animal more than once, to treat one of a pair of organs 
differently from the other, or to split a preparation from a single animal 
into several portions. An example of the latter is in the use of semen, 
Which may be divided into many portions for simultaneous testing, 
ach containing millions of sperm cells. Blood is of course another 
common body fluid offering the same advantages. White [1956] wished 


to determine whether purification of diluents used for sperm handling - 


iufected motility, and whether, if it did, this could be attributed to the 
removal of traces of metals normally present in semen. Two 4 X 2" 
factorials were set up, one with ram and one with bull semen. There 
were four ejaculates in each test, within each of which comparisons 


TABLE 4 
SUMMARY OF THE ANALYSES OF VARIANCE FoR THE DATA IN TABLE 3 
(EXPERIMENT 2) SHOWING VARIANCE Ratios Wrru THE INTERACTION 
SQUARE IN [raLics AT THE Bask OF THE COLUMNS 


Varianee ratio 


| 
Source of variation i «Lf, | 
Bull | Ram 
Between treatments: 7 | 14.21°* |} | 53.60** 
Between diluents (/)) I | 82.88** 
of purifieation (7?) 1 | | 185.31** 
Effect of metals (7) 1 | | 7.50% 
D X P interaction 1 | | 
M interaction 1 | | 
PX M interaction 0.92 | 0.63 
DX PX M interaction 0.02 | 1.13 
Between ejaculates 3 10.50** 
Residual 21 53 14 


O05, O01, 
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were made of crude vs. purified, fructose vs. glucose-containing diluents, 
with or without added trace metals (Cu, Co, Mn, Fe and Zn in typical 
concentrations). Motility was scored according to a subjective system 
of grades at intervals during each test. 

Owing to limitations necessary on the size of the experiment, it 
was decided to use ejaculate-treatment interactions as error. The 
results are shown in Table 3 and the main effects and interactions in 
the 2° factorials are in Table 4. A large and highly significant diluent- 
purification interaction accounts for significant main effects due to both 
diluents and purification in both species.. In addition, in the ram, 
smaller but significant effects of metals and a diluent-metal inter- 
action are seen. In both species, however, the main outstanding 
effect is of purification on the behavior of sperm in the fructose dilutent. 
In fact, the method of purification partially destroyed the sugar with 
the formation of toxic products. It will be seen from examination of 
Tables 3 and 4 that, even without replication, such a test picks up small 
differences in response to treatment, due to the consistent responses 
from one ejaculate to another. This is typical, a very small but statis- 
tically significant effect can be picked up in work of this type, some- 
times of no immediately apparent physiological importance. 


Quantal responses 


Strictly quantal responses, such as the death or survival of an 
animal, are usually transformed to probits, sometimes to logits or 
angles, rarely otherwise. The probit transformation, despite its logical 
appeal, is by no means always suitable or manageable. Even in 
simple assays, several cycles of approximation may be required. In 
factorially designed tests, complex estimation procedures are necessary 
and with increasing numbers of factors becomes formidable. The 
angular transformation is more useful and in most situations gives as 
good a fit as any other (cf. Claringbold, Biggers and Emmens, [1953]). 
For between-animal estimation, therefore, it is frequently of use. 

When quantal responses may be repeated on the same animal, as 
in tests of anaesthetics, convulsants, or the vaginal response to oestro- 
genic hormones, within-animal estimation becomes possible. Few 
responses will usually be available from the same animal, however, and 
application of even angles to responses and response totals from in- 
dividuals becomes far too complex for any but automatic calculation. 
In such circumstances (and even in others) direct computation from 
the 0 or 1 quantal response is defensible and allows for flexibility in 
taking full advantage of modern experimental design. 
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(a) Between animals (Experiment 3) 


A simple example is taken from Claringbold ef al. [1953] of a 4? 
factorial experiment with 20 animals per group. The action of an in- 
hibitor of oestrogenic hormone (monoiodoacetate) on oestrone is 
measured quantally according to whether ovariectomised mice respond 
positively or do not. The design and results as percentage reactors 
are shown in Table 5. There were 4 doses of each compound, each 


TABLE 5 

43 Errect oF LocaLtty ADMINISTERED MONOIODOACETATE AND OESTRONE ON 
Die THE PERCENTAGE RESPONSE OF OVARIECTOMiSED MICE (I/XPERIMENT 3) 


| 
Monoiodoacetate (uq) 


— = 
25 15 | 15 | 0 
25 15 20 5 
8a) 45 35 | 15 15 
20 


16,3) 70 | 50 35 
| | | 


Number of animals per group = 20. *Logarithmic coding shown as a subscript in parenthesis. 


= double the previous one. Probit analysis of this test would follow 
| the example given by Finney [1952] except that there were no reactors 
among controls. This probit plane technique, with only two variables, 
takes about 6 hours to compute and check on a hand machine. The 
corresponding results with the angular transformation take less than 
1/2 hr. of computation on the same machine. In the example given, 
which is typical, there are no significant interactions (Table 6). A 
regression plane was fitted by eye and provisional values obtained and 
corrected as in Fisher and Yates [1948] to produce the working angles 
on which a standard analysis of variance was made. An iterative 
procedure was continued until the differences between successive 
regression coefficients were less than 1/10th of the standard errors of 
the final coefficients. A second computation (not tabulated) in which 
the empirical responses were converted to angles, corrections for zero 
response inserted from Table 1 of Claringbold ef al. [1953] and an 
analysis of variance carried out gave practically identical results. 
be Thus, the final equations were: 


TOLX, 1 + 28.98 (1) 


and 


| 
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TABLE 6 
ANALYSES OF VARIANCE OF TRANSFORMED Data OF TABLE 5 (BXPERIMENT 3) 
Sum of Mean 
Source of variation at: squares square F 

Oestrone (3) (1030.3) 

Linear 1 982.1 23:9" 

Quadratic 1 47.2 1.2 

Cubie 1 1.0 0.0 
Monoiodoacetate (3) (1123.0) 

Linear 1 1087.1 

Quadratic 1 11.4 0.3 

Cubic 1 24.5 0.6 
Interactions 9 208. 4 23.2 0.6 
Theoretical variance 41.0 

< 0.001. 
Y = 7.00X, — 7.23Y, + 28.94 (2) 


respectively, where Y is the angle of response, X, is log dose of oestrone, 
and X, is log dose of monoiodoacetate. This illustrates the findings of 
Claringbold et al. [1953] that in general, there is no advantage in an 
iterative procedure with transformation of expected responses unless 
particularly high accuracy is desired. Thus the Gaddum [1933] or 
Eisenhart [1947] method of transforming the observed response may 
often be sufficient, if not, it may be used as the first approximation to 
be followed usually by a single cycle of the maximum likelihood solution. 


(b) Within animals (Experiment 4) 


Adequate analysis of within-animal quantal responses was first 
elaborated by Claringbold [1956], using a cross-over design. He showed 
that an analysis of variance may be performed on quantal data with 
only one animal per treatment group, if the expected responses lie 
between 5 and 95%. In this test, an assay of oestrogen in ovariecto- 
mised mice, the approximate sensitivity of each animal was located 
by preliminary trials, and a 4-point assay performed as a series of Latin 
squares with doses of the standard and unknown in constant ratio, 
but scaled to individual sensitivity (Table 7). The procedure avoids 
regions of 0 or 100% response, and in this particular instance, no animal 
responded negatively throughout, and only two responded positively 
throughout. 
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TABLE 7 


REsutts OBTAINED IN A Four-Point Cross-OVER ASSAY WITH 
QUANTAL RESPONSES (EXPERIMENT 4) 


Mouse 
Latin Order of tests: Responses to: 
square No. Mean 
Sensitivity Ist 2nd 3rd 4th Sz S H U, - U H 
(10~*ug) 
I 1 4 S, SH Un 0 1 0 1 
2 2 Ui, St Un Su 0 1 1 1 
3 8 Un Ut, Su St 0 1 0 1 
4 23 Su Un Sz Utz 0 1 1 0 
II 5 8 Sy Uy Sz, Ut 1 0 1 
6 6 Ur, SH Un St ) 1 0 1 
7 4 Un Sx Uz, Su ( 0 1 1 
8 4 Uz, Su Ux 1 0 1 
III 9 3 Uy Sy Sz, UL 0 0 0 1 
10 ll SH Un Uz, Stu 0 1 0 1 
11 3 S, Ur, Sy Un 0 1 1 1 
12 11 U, St Un Su 0 1 1 1 
IV 13 8 Un SH Sz, Ut 0 1 0 1 
14 16 Ur, Sz Un 1 1 1 1 
15 4 S, Un Uz, Su 0 1 0 1 
16 8 SH Ur, Un Sti 0 0 1 1 
V 17 3 U, Sx Un Su 0 1 0 1 
18 23 Un Ur, Su Sti 0 1 1 1 
19 1 Sy Uy, Sz, Ut 0 1 0 1 
20 23 S, SH Uz, Un 0 1 1 1 
VI 21 6 Sr. Un Su U1, 1 1 1 1 
22 1 Un SH Uz, St 0 0 1 1 
23 3 Ur Sr Su Su 0 1 0 1 
24 11 SH Ur, Sr Un 0 1 0 1 


The analysis of variance is shown in Table 8, using the data as they 
stand in Table 7. The results are pleasing, in that there was no de- 
parture from parallelism and no significant differences between animals, 
indicating that mean sensitivities had been well located. In a probit 
analysis of such data, about 30 constants would require estimation 
and the solution of the equations is impracticable. It may well be 
objected that in this analysis, weighting is ignored. However, assump- 
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TABLE 8 
ANALYSIS OF VARIANCE OF THE DATA OF TABLE 7 (EXPERIMENT 4) 
Source of variation df. Mean square 

Animals 23 0.12 

Times 3 0.05 

Slope 1 9.38*** 

Difference 1 1.50** 

Parallelism 1 0.38 

Error 66 0.138 


< 0.001, < 001, 


TABLE 9 
UaYOuT AND RESULTS OF EXPERIMENT 5 WITH 0, 1 ScuRinG; 
ror ExpLaNaTION SEE TExtT. 
DosaGE GROUPS ARE IN BRACKETS 


Substance Oestrogen Pro-oestrogen 
Test 1 2 3 4 1 2 3 4 

I 9(2) 17(4) 1213) 3(1) 4(1) 13(3) 17(4) 11(2) 
II 19(4) 14(3) 7(1) 11(2) 18(4) 12(2) 12(1) 17(3) 

Groups III 8(1) 14(2) 17(4) 11(3) 10/2) 7(1) 20(3) 17(4) 
IV 83) 10(1) 9(2) 17(4) 15(3) 18(4) 16(2) 10(1) 

Pro-oestrogen Oestrogen 

ss 9(1) 20(3) 15(2) 22(4) 1914) 8&2) 7(1) 

Groups VI 14(2) 121) 21(4) 19(3) 18(3) 14(4) 6(1) 11(2) 
VII 15(3) 23(4) 16(2) 11(1) 14(2)  13(4) 
Vill 15(4) 19(2) 16(3) 11(1) 5(2) 5(1) 16(4) 13(3) 


SuMMARY 
Doses 1 2 3 4 
Response to 
oestrogen 57 81 98 132 
Response to 
pro-oestrogen 72 113 135 151 
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tion of equal weight when this is not true simply increases the variance 
of estimates and cannot bias them. This increase was not alarming 
for in fact the cross-over design as just deseribed required only one 
quarier of the ubservations usuaily needed for the same precision. 
Seven other tests deseiibed by Claringbold [1956] confirm these findings. 


(ec) Combined tests (Experiment 5) 


In order to explore this simple method in different circumstances, 
a complex assay of oestrogens was designed by Kmmens [1957] and 
analyzed by angles, 0, 1 scoring and 0, 1, 2 (semi-quantal) scoring. 
The latter was possible because two observations at an interval were 
taken from every animal after each treatment, and the responses could 
be scored as 0, 1 or 2 according to whether no reactions, 1 or 2 reactions 
were observed. There was reason to believe that a combination of 
whether an animal reacted together with a measure of length of reaction 
would increase precision. 


The design was based on four 4° Latin squares, usiig 8 groups each 


TABLE 10 
REsuULTs OF EXPERIMENT 5 wiTH 0, 1, 2 Scortna 
Substance | Oestrogen Pro-oestrogen 
Test 1 2 3 4 1 2 3 4 
I 11(2) 27(4) 183) 3(1) 5(1) 21(3) 28(4) 15(2) 
Il 30(4) 21(3) 8&1) 13(2) 31(4) 20(2) 14(1) 24(3) 
Groups IIT 10(1) 18(2) 25(4) 17(3) 13(2) 11(1) 30(3) 28(4) 
IV 16(3) 12(1) 13(2) 24(4) 23(3) 33(4) 22(2) 14(1) 
Pro-vestrogen | Oestrogen 
11(1) 36(3) 22(2) 39(4) 26(4) 92) 18(3) 10(1) 
Groups VI 20(2) 17(1) 39(4) 29(3) 24(3) 19(4) =15(2) 
VII 24(3) 39(4) 10,1) 26(2) 15(1) 
VIII 24(4) 31(2) 27(3) 1401) 5(2) 6(1) 24(4) 1903) 
SUMMARY 
Doses 1 2 3 4 
Response to oestrogen 71 100 144 192 


Response to pro-oestrogen 96 169 214 261 
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of 25 mice, without prior estimation of individual sensitivities. Each 
group was reduced either by death or by random rejection of a mouse 
to 24 for analysis. The lay-out of the test and the results scored 
quantally (0, 1) are shown in Table 9. The slope for the pro-oestrogen, 
a substance transformed by the body after injection to an oestrogen, 
was expected to differ from that for the oestrogen, and dose intervals 
were scaled accordingly. The corresponding results for 0, 1, 2 scoring 
are shown in Table 10. 

Analyses by the transformations are shown in Tables 11, 12 and 13. 
If the standard maximum likelihood version of the angular transfor- 
mation were applied to within-animal data, each animal would require 
a constant to described its mean sensitivity and thus even angles would 
give an impracticable computation. Group responses were therefore 


TABLE 11 


ANALYSIS OF VARIANCE FOR ANGULAR TRANSFORMATION DETERMINED 
FRoM GrouP REsPponsEs (24 Mice/Group) In EXPERIMENT 5 


Sum of Mean 

Source of variation df. squares square F 
Tests 44.17 6.31 4.2%0° 
Doses 7 268.45 38.35 25.79% 
Groups 7 23.25 3.32 2.2* 
Interaction 42 61.29 1.46 1.0 
Theoretical variance 1-49 

*P <0.05, **P < 0.001. 
TABLE 12 


ANALYSIS OF VARIANCE FOR (0 or 1) Scorinc, BASED ON INDIVIDUAL 
Scores EXPERIMENT 5 


Sum of Mean 

Source of variation df. squares square F 
Tests 7 6.39 0.91 4:77 
Doses 40.77 5.82 29. 8*** 
Interaction 42 8.89 0.21 i 
Residual ( =error 1) 1288 251.85 0.20 — 
Groups 3.13 0.45 1.2 
Animals within groups 184 69.74 0.38 — 

(=error 2) 


< 0.001. 
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TABLE 13 


ANALYSIS OF VARIANCE FOR (0, 1 on 2) Scorinc, BASED ON 
INDIVIDUAL ScorEs IN EXPERIMENT 5 


Sum of Mean 

Source of variation dfs squares square F 
Tests 7 17.29 2.47 1:6re* 
Doses 7 156.04 22.29 43 
Interaction 42 22.64 0.54 1.1 
Residual ( =error 1) 1288 652.66 0.51 
Groups - 7 6.70 0.96 0.9 
Animals within groups 184 193.30 1.05 — 

( =error 2) 

< 0.001. 


used to produce the analysis in Table 11, and it is seen that the inter- 
action of doses and times, with 42 degrees of freedom, gives an excellent 
estimate of the theoretical variance. However, differences between 
group responses are significantly more variable than expectation, and 
strictly speaking, any estimates such as relative potency should be made 
with a heterogeneity factor of 2.2 and only 7 degrees of freedom. 

The analyses of variance for 0, 1 or 0, 1, 2 scoring in Tables 12 and 
13 resemble that in Table 11, with the addition of sums of squares for 
animals within groups, and a residual variance. Comparisons between 
columns in Table 9 have a precision based on this residual (within- 
animal) variance, and comprise the essential parts of the test. Com- 
parisons between rows have a precision based on the mean square for 
animals within groups (between-animal) which is now seen not to 
differ significantly from that for groups. Heterogeneity is thus isolated 
in these analyses and does not interfere with the use of within-animal 
estimates of error, in this case half the magnitude of the between-animal 
estimates. Justification for the method is supplied, not only by the 
linearity of the dose-response lines, but by the fact that with 0, 1 scoring, 
the theoretical error must be between 0.19 and 0.25, since nearly all 
group responses were between 25 and 75% (compare with residual in 
Table 12 of 0.20) and with 0, 1, 2 scoring it must be close to 0.67 (com- 
pare residual in Table 13 of 0.51). 

It was a matter for conjecture to what extent within-animal 
estimates of error might be useful in a test without prior individual 
estimates of sensitivity and extended over a 16-week period as was 
this one. It is reassuring to see a two-fold gain in precision even in 
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such circumstances. It will be noted that the somewhat more complex 
0, 1, 2 scoring differentiated more strongly between doses. 


Computing 


All of the computations above were made on desk machines, although 
many of the experiments in the Department are now processed by an 
automatic digital computer (Silliac). Automatic computers make 
possible the use of complex designs, both in planning and analysis, and 
programmes are becoming available for handling almost any type of 
experiment. The full development of statistics in physiological research 
must undoubtedly depend on the availability of such computers, so 
that the computation of results will take only a fraction of the time 
required to obtain the results, instead, as would often now be the case, 
taking much longer than it does to do the experiment and thus slowing 
research. 

The use of automatic computers makes feasible the routine use of 
non-Jinear estimation procedures such as the maximum likelihood 
solution with the probit transform. Since, however, such computation 
is made by iterating a linear approximation the computing time is a 
multiple of the time taken for an approximate solution using a linear 
model. It would appear that cost of computing time is a limiting 
factor in governing our choice of a method of analysis, and there seems 
little point in using a sophisticated model when equivalent results in 
practice are obtained with a simple model. 
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RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS: A 
METHOD OF PAIRED COMPARISONS EMPLOYING 
UNEQUAL REPETITIONS ON PAIRS 
Otro DykstTRA, JR. 


General Foods Research Center 
Tarrytown, New York, U.S.A. 


1. Introduction 


Rank analysis of incomplete block designs using a method of paired 
comparisons is covered in the references. In these papers it is assumed 
that there is an experiment consisting of ¢ treatments with n repetitions 
of each of the ¢(¢ — 1)/2 pairs (7, j). It is furthermore postulated that 
associated with each of the treatinents, denoted by 7, , --- , 7, , there 
exist parameters, 7; for 7; , such that 7; => O and )°!_, x; = 1. The- 
parameters are further defined with the probability statement that, 
if X,; generally denotes an observation on a sample of 7; , 


P(X; > Xj) = + (1) 


in the comparison of 7; with T; , i # j. 

The formulas for the maximum likelihood estimates of the 7; and 
for the likelihood ratio tests are given by Bradley and Terry [1952], 
who also give tables for small treatment and sample sizes. Additional 
tables were given by Bradley [1954b]. Dykstra [1956] provides a quick 
and easy method of obtaining first estimates of the x, regardless of the 
sizes of {and n. Bradley [1954a] develops a procedure for testing the 
appropriateness of the model. Bradley [1955] also provides the formulas 
for the asymptotic variances and covariances of the maximum likelihood 
estimates. 

In all of the above papers it is assumed that there are n repetitions 
on each of the i(t — 1)/2 pairs (7, 7). The main objective of this paper 
is to develop the analogous method of analysis when there are unequal 
numbers of repetitions on the pairs. A secondary objective is to discuss 
the possibility of running balanced sub-sets of the totality of pairs, i.e., 
omitting some pairs completely. : 


2. The Likelihood Function 
We follow the text by Bradley and Terry [1952]. The probability 
176 
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of the observed result in n,, repetitions on the comparison of the treat- 


ments ¢@ and 7 is 
+. =) =) (2) 


nij 


a,, = 2n,;; + = Nj; (3) 
k=1 
where r,;;, = Lif X; > X; andr,;, = 2 if X; > X, in the Ath repetition 
of the pairs (7, j). Note then that a,; is the number of times 7’; ranks 
ahead of 7’; . 
Multiplying the appropriate expressions for all repetitions of all 
pairs we obtain the expression L(7,) for the general likelihood function. 
Thus 


with the definition 


where 


(5) 


In (5) the index of summation j takes all values except 7. This is 
analogous to equation (1) given by Bradley and Terry, which is obtained 
when all n;; are equal to n. In the above and the subsequent likelihood 
functions we shall assume probability independence between pairs of 
treatments and between repetitions. 
3. Likelihood Ratio Tests and Estimation 
We test the null hypothesis 
Hy: 2; = 1/t, t= I, (6) 


against the Bradley and Terry [1952] alternative hypothesis 


Case (i) HH, : no z; is assumed equal to any 2; (7 # j). (7) 


Differentiating the natural logarithm of (4) with respect to 2; , using 
appropriate Lagrange multipliers, and denoting the maximum likelihood 
estimates by p,; , we obtain 


| 


with the usual restriction 


i 

a 
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p= 1. (9) 
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For purposes of computing the maximum likelihood estimates p; , 
we use equation (8) in the form 


[do + p,)). (10) 


Using any method of obtaining initial p, , the resulting first estimates 
are substituted into the right side of (10) and second estimates are 
then obtained, the second estimates being resubstituted, and so on, 
until the equalities hold. Reasonable first estimates may be obtained 
by assuming that the p,'s are not too different from each other, so that 
we may put p; = p’ forall} #7. Then p’ = (1 — p,)/(¢ — 1) and we 
obtain 
= a,/((t— 1) — (t (11) 

The use of (11) for obtaining first estmates of the p, may result in 
many iterations, if the assumption that the p,’s are not too different 
is unreasonable. As an alternative method of obtaining first estimates, 
all a,, may be adjusted so that a,; + a;, = % = 2 0,2; nj;/lt — 1). 
This may be done by adding (i — n,,)/2 to exch a;; and a,; or by 
multiplying each a;; and a;, by # n,, . The procedure previously 
described by the author [1956] may now be applied. 

Tor the null hypothesis (6) and the alternative (7) the likelihood 
ratio test depends on the statistic’ 


B, = log + p,) — Doa, log p, . (12) 


Then, letting \ be the likelihood ratio, 

—2 Ind = 2(>5n,,) In2 — 2B, In 10 (13) 
is distributed in the limit as x° with ¢ — 1 degrees of freedom when all 
n,; become large. 


4. Appropriateness of the Model 


Equation (1) defined the probability P(X, > ,) indicating the 
role of the 4 parameters , , We could have defined — 1)/2 
parameters 7,, such that. 


P(X, > X,) = Fi; (11) 
and these parameters could have been estimated by 


a,,/N,; (15) 


iWe shall uge loy and In to represent logarithins to base 10 and base ¢ respectively. 
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where j,; is the maximum likelihood estimator of 7;; , and a;; is the 
number of times 7'; ranks above 7; in n;; repetitions. The likelihood 
tunction is written as 
= 3 +a;; , tii = 1. (16) 
i<j 
We assume probability independence between pairs and between 
repetitions. We are interested in the test: 


A, :%;; /(x; +7;) forsome 2 and j. 
This yields 
—2Iny\ = 2I1n 10/0 a;; log a,; — log n;; + B,), (17) 


i<i 
which is distributed in the limit as x’ with (¢ — 1) (¢ — 2)/2 degrees of 
freedom when all n;; become large. B, is defined by (12). The zeros 
which may enter (17) when n,;; = 0 are ignored, but, of course, one 
degree of freedom is lost for each n;; equal to zero. 

The test for appropriateness of the model, equation (17), must be 
interpreted very carefully when a number of n;; are zero. Suppose, for 
example, that a number of judgments are obtained on the 6 pairs 
among 4 samples. There are 3 degrees of freedom available for the 
test statistic (13) and 3 d.f. for the test of the model. Suppose, too, 
that judgments are obtained on the comparison of one of these 4 samples 
against a fifth sample. The degrees of freedom for the test statistic 
(13) increase to 4, but there are still only 3 d.f. for the test of the model. 
In this case the statistic (17) tests the model only for the first four 
samples. 

There is a possibility that the test statistic (17) may be distorted 
when some n,; are small. We have no logical basis for pooling in this 
case as is often done in x’-tests so that a better procedure may be to 
set the disproportionately small n;; to zero, assuming that these pairs 
had not been compared and thus omitting their contributions to x’ in 
(17). This concept is perhaps clearer in terms of the second of two forms 
of the x’-statistic for the test of the model as given by Bradley. This 
second form in our notation gives 


x” (a;; — (18) 
with a!; = n,;p;/(p; + p;) and a; = ,;p;; . Rewriting this yields 
= mas — [pi/(p: + + (19) 
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Terms in the sum for x’ with small n;; may be omitted and omission 
of each pair of terms associated with a given n;; (n;; = n;,) results in 
the loss of one degree of freedom. The suggestion at the beginning of 
this paragraph in regard to the form (17) is the equivalent rule. 


5. Precision of the p; 


The have expectation + and variances 
(x; + 2;)°. Thea,; making up the sum a, are assumed to be independ- 
ent in probability, so that 


E\a,) = Niles + ™)', s=1,---,¢, (20) 
where denotes summation with h = i, --- , except thath 
A 
say, 7 = »t, 
say,t j;1,j 1, . 
where and are defined by 
NG = DY’ + i= 1, 
h 


(23) 
Aig = + t # j; , 8. 


These definitions differ somewhat from those of Bradley [1955]. We 
will also define 


With the proof supplied by Gart and Bradley and from the fact 


(24) 


that the maximum likelihood estimates of the x, have joint limiting: 


normal distributions, we may take — 71), , (Pi-1 — to 
have for large samples the multivariate normal distribution with zero 
mean and dispersion matrix [\/,;]"". If ,; is the covariance of p; — 7; 
and p; — 7; ¥ j;1,j = 1, ,# — 1), then 


t-1 
Dei, — Dion. (25) 
i=l 


In the null case where all n;; = mn and x; = 1/¢ 


Cov — — 7) = —4/nt’; 8. 
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However, in our case the n,; are not all necessarily equal, so that in the 
null ease 


Mi = (6/4) Din. 
(27) 
‘the superseript will be taken in the subsequent text to indicate the null 
ease, when x; = 1/f. Inversion of yields (¢, j = 1, ,#— 1) 
for the null case. 
6. Pairwise Precision 
In some situations we may be interested in the pairwise comparison 
of T,; and 7; given by P;; = est(z.:;) = p.:/(pi + p;). Using the first 
two terms of the Taylor expansion 
V(P;,;) = (aP;;/dp,)° V(p,) + (AP ;;/dp;) V(p,) (28) 
+ ;:/Op;) Cov , P,); 
we find that the variance of P;; is approximated by 
so that in the null case 
V(P;; | Ho) = (€/16)[o?; + — 20%;). (30) 
If we had run only the pair (7, 7) and had obtained N; repetitions, the 
proportion ranking 7 over j would have the same meaning as P,; but 
would have variance 1/4N,; in the null case. To determine what 
sample size would have been necessary in a two-sample test in order 
to obtain P,; with the same precision as results from the paired com- 
parisons analysis, we equate 1/1N,,; te (80). Wren volved for , 
this: yields 
2/7 _0 0 0 1 
When all n;; = n, (81) reduces to N,;; = nt/2 for all 7 and j, 7 # j,a 
result which Bradley obtained in unpublished work. 
Values of N;; have been obtained in terms of the n,; for 4 = 3 and 
4 = 4. These are for ¢ = 3. 
= + + 23) for pair (1, 2), 
Nix = Mig + + N23) for pair (1,3), and (32) 


Nig = + + for pair (2. 3). 
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For ¢ = 4, the N,, are: 
Miz = Ne 


(M14 + (Miz Mox) + + Mg + Noa) 
for the pair (1, 2) and, in general, for pair (7, j) 
Ni; = Ne; 


(nin + + + + Nie + 


where k # ¥ j, andk, h i, j. 
It seems undesirable to find the N,; explicitly in terms of the n,; for 
i larger than 4. 


7. Balanced Sub- Sets 


The variable numbers of returns from miai'-ut tests had prompted 
the author to consider the problem of unequal numbers of repetitions 
on the pairs. Iather than discard some of the judgments to obtain a 
balanced set of data, it was desired to use all of the data. Other appli- 
cations of the method occur when the results of a balanced test are 
supplemented by additional data on one or more of the pairs and when 
two or more tests which have common members are combined. 

In the present section we consider still another possible application. 
Since the number of pairs increases so rapidly as ¢ increases, a test with 
many treatments may become administratively unwieldy. In order 
to reduce the difficulties of administration, it may be desired to omit 
some pairs completely. Apart from administration, running all possible 
pairs might require more judgments than are economically feasthic. — 
In either case we weuld want to run a balanced sub-set of pairs. 

Wilkinson [1957] has considered the situsticr. where there are 
incomplete repetitions, in the sease that each judge considers r pairs, 
i sr tt — 1),/2. ‘The designs he uses, however, have, in total, an 
equal number of comparisons on each of the i(¢ — 1)/2 pairs. 

We will consider balanced sub-sets in the sense that n;; = n for 
certain (7, j) and 0 for the other (7, 7). The designs we have selected 
for study here are given by Clatworthy [1955]. The pertinent definitions 
for these partially balanced incomplete block designs are: 


a. There are ¢ treatments (Clatworthy uses v as the number of treat- 
ments). 


b. There are b blocks, with i = 2 treatments in each block. 
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¢. Mach of the / treatments occurs r times in the arrangement, so that 
in =" Zo. 

d. Every pair of treatments oceurs cither X. or hy Utes. We limit our 
attention to \, = Jand dA, = lord, = landdA, = G. (This is ai 
nnavoidable ecnflict in notation. The values of \, and \, Gefined 
here have no relationship to the d’s defined in Section 5.) 

«. Kach treatment occurs with , other treatments, which are first 
associates, and does not occur with the remaining n, = ¢t — 1 — n, 
treatments, which are second associates. 


We will examine four types of designs discussed by Clatworthy. 

Each experiment will consist of nb judgments. Had we spread 
the nb judgments equally over the /(¢ — 1)/2 pairs, the pairwise pre- 
cision (= N,;) for each pair would have been nb/(t — 1) in the null 
case. Running a balanced sub-set of pairs will result on the average 
in a loss of information. We give the efficiency of each design, defining 
efficiency as the average pairwise precision divided by ~b/(f — 1). 

A. Group Divisibie Designs. The l = mn’ treatments (although 
Clatworthy and others use ¢ = mn, we must use n’ to distinguish from 


TABLE 1 
Group DivisinLeE DEsIGNsS WITH 2 S r S 10 
Treat- Repli- No. in Effi- 
Clatworthy’s ments Blocks | cations Groups Group ciency 
Design No. t b r m n' E 

1 4 4 2 2 2 "917 

2 6 9 3 2 3 .933 

3 6 12 4 3 2 .967 

4 8 16 4 2 4 - 946 

10 8 24 6 A 2 .982 

11 9 | 27 6 3 3 72 

7 i0 25 5 2 5 .956 

23 10 40 8 B 3 .989 

12 12 36 6 2: a 962 

“ 12 48 8 3 4 977 

35 12 54 9 4 3 -985 

44 12 60 10 6 2 . 992 

17 14 49 7 2 7 .967 

45 15 75 10 3 5 .981 

25 16 64 8 2 8 971 

36 18 81 9 2 9 .974 

46 20 100 10 2 10 .976 
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the previously defined n, which is the number of repetitions for those 
pairs which are run) can be divided into m groups of n’ treatments each, 
such that the pairs withiy » group are not run. Table 1 gives the 
details {1 group divisible designs with 2 < i < 10. Inu constructing 
Design 2, for exampk. we may have treatments 1, 2, and 3 in one 
group and treatments 4, 5, and 6 in the other. Sipe we will not run 
pairs within a group tee only pais we will run are (1, (1, 5), (1, 6), 
(2, +), (2, 5), (2, 6), (3, 4), (3, 5), and (3, 6). 

For the designs given in Table 1 we will have in the null case a 
pairwise percision V,; = nb /(¢ — 1) for those pairs which are run and 
N;; = nb/t for those pairs not run. The efficiency of the design is 
E = [2b + — 1)). 

B. Triangular Designs. The number of treatments can be described 
as ¢ = n’(n’ — 1)/2 (we again deviate and use n’, in order to avoid 
conflict in notation). The association scheme is given below for 1 = 10 
and n’ = 5. 


Xx 12 3 4 
1X 6 6 7 
2 5X 8 
3 6 8 X 10 
4 7 9 10 X 


We form a square of size n’, delete the diagonal, and fill in the treatment 
numbers symmetrically around the diagonal. Two designs are possible: 


1) We run pairs of treatments for those treatments lying in the same 
column; i.e., (1,2), (1, 3), (1, 4), (2, 3), (2,4), (8, 4), (1, 5), (1, 6), 
(1, 7), (5, 6), (5, 7), (6, 7), (2, 5), (2, 8), (2, 9), (5, 8), (5, 9), (8, 9), 
(3, 6), (3, 8), (3, 19), (6, 8), (6, 10), (8. 16) (4, 7, (4, 10), 
(7, 9), (7, 10) and (1. 10). 

2) Yor n > 4 (Designs 5, 6, and 7), we run pairs of treatmeits for 
those treatments not lying in the same column; :.e., (1, 8), (1,9), 
(1, 10), (2, 6), (2, 7), (2, 10), (3, 5), (8, 7), (, 9), 4, 5), (4, 6), (4, 8), 
(5, 10), (6, 9), and (7, 8). 


Table 2 gives the details of the triangular designs with 2 < r < 10. 

For Designs 1 through 4 in Table 2 the pairwise precision in the null 
case is N,;; = nb/(t — 1) for pairs which are run and N,,; = nb/(t — 2 + 
n’ /2] for those pairs not run. The efficiency of these designs is given by 
E = [4/(n’ + 1)] + [(n’ — 3)/(n’ + 2)]. For Designs 5, 6, and 7, the 
pairwise precision in the null case is N;; = nb/(t — 1) for those pairs 
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UNEQUAL REPETITIONS IN PAIRED COMPARISONS 


TABLI: 2 


TRIANGULAR DESIGNS Witn 2 < r < 10 


Treat- | Repli- Size of | 
Clatworthy's ments Blocks cations Square Efficiency 
Design No. t b r n’ E 
6 12 1 | 967 
2 10 300 6 5 952 
3 15 60 8 6 946 
21 105 10 944 
5 10 15 3 5 833 
6 15 45 6 6 | .929 
7 21 105 | 10 7 962 
| | 


which are run and N;; = nb/[t + 2/(n’ — 4)] for those pairs not run. 
The efficiency is FE = [2b/t(¢ — 1)] + {{(¢ — 1) — 2b/4)/[t + 2/(n’ — 4)]}. 

C. Square Designs. Here we have t = s’. The designs are con- 
structed by placing the ¢ treatment numbers into a square of size s and 
running pairs of treatments for those treatments common to a column 
or row. Thus for ¢ = 9 we have 


3 
4 5 6 
7 


We run the pairs among (1, 4, 7), (2, 5, 8), (3, 6, 9), (1, 2, 3), (4. 5, 6), 
and (7, 8, 9). In an alternate design (number 11 in Table 3) we pair 
each treatment with those treatments not lying in the same row or 


TABLE 3 


SquarRE Designs 2 10 


| | 
Treat- Repli- rer 
Clatworthy's ments | Blocks cations Square Efficieney 
Design No. t b | r | s E 
4 4 » | 2 | wae 
2 9 18 | 4 | 
3 16 4 6 4 | 900 
4 25 WO | 8 5 905 
5 365 180 | 
16 72 9 4 
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column of the association scheme. Table 3 gives the square designs 
for 2 Ss r Ss 10. For Designs | through 5 the pairwise precision in 
the null case is V,;; = nb ‘(¢ — 1) for those pairs which are run and 
NV; = nb (t+ s — 2) for those pairs not run. The efficiency is f= 
2 e+ + 2. 

I). Cyclic Designs. The construction of the cyclic designs is covered 
by Clatworthy and will not be discussed here. Table 4 gives the cyclic 
designs with 2 S r < 10. 


TABLE 4 
Cyciic Designs 2 r S 10 
Clatworthy’s Treatment Blocks Replications Efficiency 
Design No. t b r E 
1 5 5 2 . 833 
10 13 39 6 .929 
11 17 68 8 | 944 


For the designs in Table 4 the pairwise precision in the null case is 
N,; = nb/(t — 1) for those pairs which are run and nb/(t + 1) for those 
pairs not run. The efficiency is E = t/(t + 1). 


8. A Practical Example 


Two tests were run in which two types of a product (coded 7, and 
T,) appeared in each test. Two other types of the same product (codes 


TABLE 5 
Summary or Resutts or a Taste TEst 
te 
Pair | Treatment Preferences Pair Treatment Preferences 
| 
T,, 72 Ti 28 T:,T3 46 
T2 112 17 
140 N23 
63 
15 T2,7. T2 47 
3 39 ll 
N13 54 Ne 58 
T; 23 7. 
T, 34 7, 0 
N14 | 57 


q 
‘ 
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7, and 7',) were tested, one in each paired comparisons test. The 
results are summarized in Table 5. Note that the pair (7; , T,) was 
not run. The treatment totals (a;), the first estimates of the p; , and 
the maximum likelihood estimates (p,) are given in Table 6. 


TABLE 6 
ANALYSIS OF THE ILLUSTRATIVE EXAMPLE 
First Estimator 
Treatment (7) a; of p; Di 
1 66 .106 . 1082 
2 205 . 550 5193 
3 56 . 234 2294 
4 45 .176 1431 
Totals 372 1.066 1.0000 


From (12) we find B, = 89.5992. From (13) we find x’ = 103.08 with 

3 degrees of freedom and, therefore, conclude that the 7; are not equal. 

From (17) we find x’ = 2.00 with [3(2)/2] — 1 = 2 degrees of freedom 

so that evidence to doubt the appropriateness of the model is lacking. 
To find the pairwise precisions we find 


251 —140 —54 


480 90 118 
= 140 261-63 — 58) 99 492 110 


118 110 232 


0023961 —.0001855 — .0011307 
= (1/4)| — 0001855 0022879 — .0009904 |- 
— .0011307 —.0009904 .0053551_ 


This yields a variance-covariance matrix 


.0023961 —.0001855 —.0011307 —.0010799 
— .0001855 .0022879 —.0009904 —.0011120) 
— .0011307 — .0009904 .0053551 — .0032340 
iL—.0010799 —.0011120 — .0032340 0054259 


(1/4) 
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computed fiom this matrix, the P,; , and V(P,,), where 
N,; are given in Table 7. 


TABLE 7 


PArRWISE COMPARISONS AND PRECISIONS 


1 t T; | \ ) VV(P,;) 
23 103.9 69.36 | 4.52 
100.6 | 78.40 | 4.10 
7.8 82.76 | 2.69 
58.0 | 61.58 | 6.39 
9.9 | 67.95 | 4.67 
11 wWo2 | 56.94 4.95 


We tnd that the ?,, for all pairs but pairs (3, 4) and (4, 1) exceed 50% 
by more than 2 standard deviations. 


Summary 


The previous papers assumed balanced information, which may not 
atta in some experimental situations. The present paper has given 
the analogous procedures for unbalanced data. An investigation of 
balanced sub-sets of pairs gives some indication of the efficiency of 
unequal numbers of repetitions. 
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PATH COEFFICIENTS AND PATH REGRESSIONS: 
ALTERNATIVE OR COMPLEMENTARY CONCEPTS?’ 


SEWALL WRIGHT 
Department of Genetics, University of Wisconsin 
Madison, Wisconsin, U.S.A. 


INTRODUCTION 


In a recent paper, Turner and Stevens [1959] develop a modification 
of the method of path coefficients (Wright [1921, 1934, 1954]). Follow- 
ing Tukey [1954] they advocate systematic replacement in path analysis 
of the dimensionless path coefficients by the corresponding concrete 
path regressions. The purpose of the present paper is to discuss this 
and other points which they raise. 


(1) The authors concur with Tukey in treating the standardized 
and concrete forms of correlational statistics as if they were alternative 
conceptions between which it is necessary to make a choice. It has 
always seemed to me that these should be looked upon as two aspects 
of a single theory corresponding to different modes of interpretation 
which, taken together, often give a deeper understanding of a situation 
than either can give by itself. 

(2) Even when the sole objectives of analysis are the concrete coeffi- 
cients, actual path analysis takes a simpler and more homogeneous form 
in terms of the standardized ones. The application of the method to 
data usually requires algebraic manipulation of coefficients pertaining 
to unmeasured variables on the same basis as measured ones. As the 
former can only be dealt with in standardized form, homogeneity 
requires that all be so dealt with in the course of the algebra. It is 
such a simple matter to pass from either form to the other (in the cases 
in which standard deviations are available to all) that the economy of 
effort in using the concrete coefficients as far as possible, where these 
are the objectives, is usually outweighed by the loss of economy in 
other respects. 

(3) It is of first importance in path analysis to make use of all of 
the available data. This is not done by Turner and Stevens in most 
of their examples. The use of standardized coefficients leads naturally 
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to the systematic expression of all of the available information in the | 


form of equations to be solved simultaneously. 

(4) Turner and Stevens, again following Tukey, go into the direct 
treatment of reciprocal interaction between variables by path analysis. 
This interesting topic requires more extended discussion than is appropri- 
ate here. 


REVIEW OF THE METHOD OF PATH ANALYSIS 


Before taking up these points in detail, it seems necessary to review — 


the method briefly to try to clear up certain misunderstandings. 


| 


The method is one for dealing with a system of interrelated variables. 
It is based on the construction of a qualitative diagram in which every © 
included variable, measured or hypothetical, is represented (by arrows) — 
either as completely determined by certain others (which may be repre- | 
sented as similarly determined) or as an ultimate factor. Each ultimate | 
factor in the diagram must be connected (by lines with arrowheads at | 


both ends) with each of the other ultimate factors to indicate possible 


correlation through still more remote unrepresented factors, except in 


eases in which it can safely be assumed that there is no correlation. 

The necessary formal completeness of the diagram requires the 
introduction of a symbol for the array of unknown residual factors 
among those back of each variable that is not represented as one of 
the ultimate factors, unless it can safely be assumed that there is 
complete determination by the known factors. Such a residual factor 
can be assumed by definition to be uncorrelated with any of the other 
factors immediately back of the same variable, but cannot be assumed | 
to be independent of other variables in the system without careful 
consideration. 

It is assumed here that ali relations are linear: non-linear relations 
may sometimes be transformed systematically throughout a diagram 
into linear ones. Approximate results may be obtained without trans- 
formation where devi:tions from linearity are small within the range- 
of actual variation. hus a product of uncorrelated variables X Y may 
be treated as approximately additive 6(XY) = Y 6X + X 6Y if the 
coefficients of variability ¢,/X and o,/Y are not too large. If the latter 
are equal, the fraction of the variance that is excluded is less than 
half the squared coefficient of variability. It is also possible to deal 
rigorously with joint variability in restricted cases but this extension 
will not be dealt with here. i 

The validity of the system requires that variables that enter into, 
two or more relations in the system (as a commow factor of two or 
more, or as an intermediary in a chain) act as if point variables. If 
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one part of a composite variable (such as a total or average) is more 
significant in one relation and another part in another, the treatment 
of the variable as if it were a unit may lead to grossly erroneous results. 
Fortunately, the parts of a composite variable are often known to be 
so strongly correlated in their values, or in their action, or both, that 
they may be used to obtain approximate results. It cannot be empha- 
sized too much, however, that apart from special extensions, the strict 
validity of the method depends on the properties of formally complete 
linear systems of point variables. 

The primary purpose was stated in the first general account (Wright 
[1921] as follows: 


“The present paper is an attempt to present a method of measuring the direct 
influence along each separate path in such a system and thus of finding the degree 
to which variation of a given effect is determined by each particular cause. The 
method depends on the combination of knowledge of the degrees of correlation among 
the variables in a system with such knowledge as may be possessed of the causal 
relations. In cases in which the causal relations are uncertain, the method can be 
used to find the logical consequences of any particular hypothesis in regard to them.” 


It was brought out here and later that the method 


“is by no means restricted to relations that can be described as ones of cause and 
effect. It can be applied to purely mathematical systems of linear relations and 
merges into the methods of multiple regression and multivariable vectorial analysis 
when applied to the symmetrical systems of relations that characterize these 
methods” [1954]. 


The basic diagram in developing the theory is one in which a variable 
V, (Figure 1) is represented as completely determined by a number of 
immediate factors V, , V2, ---, Vm, V, all of which, except the unknown 
residual V, , are represented as intercorrelated. We are to consider 
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FIGURE 1. 
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the correlation of V5 with any variable 1. ‘The latter must be repre- 
sented as correlated with each of the factors of Vp including the residual 
V’. if there is no reason to the contrary. 

As all relations are assumed to be linear, we have: 


Vo = Cy + + + + + (1) 


The coefficients c,, etc. are of the type of partial regression co- 
efficients but are in a system that involves the residual V, (unless there 
is known to be complete determination by the other immediate factors). 
It may involve other unmeasured hypothetical variables. The co- 
efficients are thus ordinarily not deducible directly from the statistics 
in the way that is possible for a conventional partial regression co- 
efficient such as bo, 2; where V, and V; as well as V, and V, are measured 
variables. They have meaning only in connection with a specified 
diagram. The symbol co, is used to distinguish such a quantity, defined 
as a path regression coefficient (Wright [1921], from the total regression 
coefficient bo; . 

The path regression coefficient c,,; measures the concrete con- 
tribution that V, is supposed to inake directly to V> from the point of 
view represented in the diagram. If this correctly represents the 
causal relations, the path regression measures this contribution in an 
absolute sense and its value can be used in the analysis of other popu- 
lations (Wright [1921, 1931]). Tukey [1954] and Turner and Stevens 
[1959] properly emphasize this virtue. The standardized path co- 
efficient Por = C9;/¢o obviously does not have this property, but it 
has other virtues including greater convenience in analysis. 


Let X, = (Vo — V.)/oo ete. 
Then 
Xo = Pours + PomA m + PouXu : (2) 


In this standardized form, all correlation coefficients are reduced 
to product moments. 


Too = (1/n) 
ia + 2¢ + Pon! ma + Pol ue ie (3) 


If V, is one of the immediate factors e.g. V; , 74. = 0, 


To = Pr + + + - (4) 
If V, is Vz itself, 
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Too = PoiTo1 + Po2" o2 + + Pomlom + Dou =1 
fo = ~ PoiTo; + Pou. (where V; does not include V,). (5) 
i=1 


The term >> po;fo; is the squared coefficient of correlation rj, with 
the best estimate of V, that can be made from immediate factors other 
than V, (squared coefficient of multiple correlation) and rj, = Do. 
(= 1 — } po;ro;) is the squared error of estimate. Thus ro = ree + ru - 

Returning to (3), which it may be noted does not depend on the 
assumption of normality of any of the variables, we note that it contains 
correlation coefficients which are capable of analysis by application 
of this formula to itself, if any of the immediate factors or V, are repre- 
sented as determined by more remote factors in a more extended 
diagram. The principle that is arrived at (for systems in which there 
are no paths that return on themselves) may be stated as follows: the 
correlation between any two variables in a properly constructed diagram 
of relations is equal to the sum of contributions pertaining to the paths 
by which one may trace from one to the other in the diagram without 
going back after going forward along an arrow and without passing 
through any variable twice in the same path. A coefficient pertaining 
to the whole path connecting two variables, and thus measuring the 
contribution of that path to the correlation, is known as a compound 
path coefficient. Its value is the product of the values of the coefficients 
pertaining to the elementary paths along its course. One, but not 
more than one of these, may pertain to a two-headed arrow without 
violating the rule against going back after going forward. 

A uni-directional compound path coefficient may be indicated by 
listing the variables in order, from dependent to most remote independ- 
ent, as subscripts. Thus in Figure 2, po,3 pertains to the path 
Vo V, — V; and has the value po:p;3. In a bi-directional compound 


FIGURE 2. 
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path coefficient, it is convenient to list the variables in order from 
either end, but set off the ultimate common factor or pair of factors by 
parentheses. Thus po:.3)2 in Figure 2 pertains to the path Vp — V, — 
V; V., with value and pertains to the path Vp 
V,—> V, with value According to the rule 
above, = Poz2 + + In previous papers the turning 
point in a compound path has been indicated by a typographically 
somewhat awkward dot or dash over the pertinent subscript or pair of 
subscripts. 


CONCRETE VERSUS STANDARDIZED COEFFICIENTS 


The comparison of the uses of standardized path coefficients and 
concrete path regressions can best be made in conjunction with a con- 
sideration of those of ordinary correlation and regression coefficients. 

(1) The coefficient of correlation is a useful statistic in providing 
a scale from —1 through 0 to +1 for comparing degrees of correlation. . 

It is not the only statistic that can provide such a scale and these 
scales need not agree. ‘There are statistics that agree at the three 
points referred to above without even rough agreement elsewhere 
(ef. rj, with 7ro,), but, if one has become familiar with the situation 
implied by such values of 7, as .10, .50, .90 etc., the specification of 
the correlation coefficient conveys valuable information about the 
population that is under consideration. It is not a good reason to dis- 
card this coefficient because some have made the mistake of treating 
it as if it were an absolute property of the two variables. The term 
path coefficient was first used in an analysis of variability of amount 
of white in the spotted pattern of guinea pigs (Wright [1920]). There 
was a correlation of +.211 +.015 between parent and offspring and 
one of +.214 +.018 between litter mates in a randomly bred strain. 
The fact that the corresponding correlations in another population 
(one tracing to a single mating after seven generations of brother- 
sister mating) were significantly different (+.014 +.022 and +.069 
+.028 respectively) presented in a clear way a question for analysis 
for which the method of path coefficients seemed well adapted, not a 
reason for abandoning correlation coefficients. 

Path coefficients resemble correlation coefficients in describing 
relations on an abstract scale. A diagram of functional relations, in 
which it is possible to assign path coefficients to each arrow, gives at a 
glance the relative direct contributions of variability of the immediate 
causal factors to vatiability of the effect in each case. They differ from 
correlation coefficients in that they may exceed +1 or —1 in absolute 
value. Such a value shows at a glance that direct action of the factor 
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in question is tending to bring about greater variability than is actually 
observed. The direct effect must be offset by opposing correlated 
effects of other factors. As in the case of correlation coefficients, the 
fact that the same type of interpretative diagram yields different path 
coefficients when applied to different populations is valuable for com- 
parative purposes. 

(2) The correlation coefficient seems to be the most useful parameter 
for supplementing the means and standard deviations of normally 
distributed variables in describing bivariate and multivariate dis- 
tributions in mathematical form. Its value in this connection does 
not, however, mean that its usefulness otherwise is restricted to normally 
distributed variables as seems to be implied by Turner and Stevens. 

The last point may be illustrated by citing one of the basic correl- 
ation arrays of population genetics, that for parent and offspring in a 
random breeding populgtion, with respect to a single pair of alleles 
and no environmental complications. 


Parent Offspring Total Grade 
aa Aa AA 
AA 0 q(1 — q) ao + ar + ae 
Aa qi -—q? | 21-4) ao + 
aa 0 (1 — q)? ao 
Total (1 — q)? q) 1 


(1 — g)’at + — + 
(1 — g)(2 — gai + — gaara, + + gas 


= 1/2 if no dominance, a. = aq . 


Tour = 


As each variable takes only three discrete values, as the intervals 
are unequal unless dominance is wholly lacking (a, = a,), and as gene 
frequency q may take any value between 0 and 1, making extreme 
asymmetry possible, this distribution is very far from being bivariate 
normal. Obviously this correlation coefficient has no application as a 
parameter of such a distribution. It may, however, be used rigorously 
in all of the other respects discussed here. It may be noted 
that its value may be deduced rigorously from the population array 
[((1 — g)a + qAl]’, the assigned grades, and certain path coefficients 
that are obvious on inspection from a diagram representing the relation 
of parental and offspring genotypes under the Mendelian mechanism. 

The correlation coefficients in a set of variables are statistical 
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properties of the population in question that are independent of any 
point of view toward the relations among the variables. The depend- 
ence of calculated path coefficients on the pomt of view represented in 
a particular diagram, of course, restricts their use as parameters to this 
point of view. 

(3) Assuming linearity, the squared correlation coefficient measures 
the portion of the variance of either of the two variables, that is con- 
trolled directly or indirectly, by the other, in the sense that it gives 
the ratio of the variance of means of one for given values of the other 
to the total variance of the former [77, = o%,2)/o;]. Correspondingly 
it gives the average portion of the variance of one that is lost at given 
values of the other [o;.. = o{(1 — 7%.)]. We are here using o7,,, and 
o;.2 for the components of o; that are dependent and independent 
respectively of variable V, . 

Equation (5), expressing complete determination of Vy by its 
factors, can be expanded into the form 


fo = +2 Po:PorT ik + Pow 4. (6) 

On multiplying both sides by o; , it may be seen that the squared 
path coefficients measure the portions of of that are determined directly 
by the indicated factors while other terms (which may be negative) 
measure correlational determination. 

(4) The correlation coefficient r5, measures the slope of the line of 
means of V, relative to V; (or the converse) on standardized scales, 
and merely needs to be multiplied by the proper ratio of standard 
deviations to express regression in concrete terms (bo: = Toito/o1 , 
by = 1101/00). 

As already noted, the abstract path coeflicients have the same 
relation to the concrete path regressions. 

(5) Another statistic of this family, the product moment, 


M,,(ViV2) = coviz = (1/n) V,)(V2 V2) = 1120192 


is useful on its own account in various ways. The product moment 
in a heterogeneous population may for example be analyzed into the 
sum of the product moment of the weighted means of the subpopula- 
tions and the average product moment within these (Wright [1917]). 
It was because of this additive property, analogous to that of the 
squared standard deviation, that Fisher later renamed ‘this quantity 
the covariance in analogy with his term variance (lisher [1918]) for 
the squared standard deviation. 
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Since the compound path coefficients analyze each correlation into 
additive contributions from each chain that connects the two variables, 
they merely need to be multiplied by the terminal standard deviations 
to give a similar analysis of the covariance. 

(6) In certain situations (including important ones in population 
genetics) the correlation coefficient can be interpreted as a probability. 
This may be analyzable into components in terms of compound path 
coefiicients. 

(7) Finally, the formula for the correlation between linear functions 
is often useful and is the one that leads directly to path analysis. 


If 


| 


=¢s+ > , 

Psi = , 
DL Pspri t+ 2 i. 


The most extensive applications of the method have been essentially 
of this sort, the deduction of correlations from known functional 
relations in population genetics (Wright [1921, 1931, 1951]). The inverse 
problem, that of deducing path coefficients from known correlations 
and a given pattern of relations, depends on the solution of a system 
of simultaneous equations usually of higher degree than first and 
thus often requires rather tedious iteration. 

We conclude that both the standardized and concrete coefficients 
for describing relationships between variables are useful and that the 
rejection of either would impoverish the theory. 


| 


(7) 


Tsr 


THE USE OF PATH COEFFICIENTS IN ANALYSIS 


We come now to the point that even where concrete path regressions 
are the sole objectives, the analysis had best be carried out in terms of 
the standardized coefficients from which the desired concrete ones may 
be derived as the final step. We give below a number of simple systems. 
Numerical subscripts are used here for the variables that are supposed 
to be measured and literal ones for the hypothetical variables including 
the residuals necessary for completion. All of the equations that can 
be written from the known correlations and from cases of complete 
determination are expressed in terms of path coefficients and residual 
correlations. They can all be written from inspection by tracing con- 
necting paths. All of them can also be written in concrete terms by use 
of formulae given above. This is done in some of the cases. Parentheses 
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enclose quantities that are inseparable analysis restricted to con- 
crete coetlicients. 


Vy >V, PV, 


Vy Vu 
FIGURE 3. 


Standardized Concrete 
Coefficients | Coeflicients 
(1) me = pre be = er 
(2) ra = po ba = Co 
(3) re = pope be = 
(4) ro = 1 = pai Pou = + (ci, 0.) 
(3) =l= pie + Pie = C1203 + 


In this case, the first three equations are equally simple with con- 
crete and standard coefficients and overdetermine two paths. One 
may (a) obtain a compromise solution as by the method of least squares 
or (b) may attribute any inconsistency to correlations between V, and 
the variables back of 1, (which must compensate in such a way that 
r1, = O as required by the definition of V, as a residual) or (c) may 
assume that the measurements of V, are in error. The hypothesis 
of errors in either V, or V, does not resolve any inconsistency of equa- 
tions (1), (2) and (3). 

The appropriate diagram and equations under (b) above are as 
follows: 


V2 V, > Ve 


FIGURE 4. 


(1) the = Pie, (4) =1= Pon, + Dine 
(2) Pre + Dis (5) = PoiPi2 + » 
(3) m = pm (sincer,, = 0), = 0 = Pita t+ - 
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A necessary condition for solution, that there be at least as many 
independent equations as paths, is met. This is not, in general, a 
sufficient condition since a system may be underdetermined in one 
part and overdetermined in another. In this case, however, the un- 
known path coefficients and correlations can be obtained in succession 
from the above equations. 

The hypothesis that inconsistency of (1), (2) and (3) under Figure 
3 is due to errors of measurement of V, is represented in Figure 5 and 
in the equations below. 


> 


Vv, V, 


FIGURE 5. 


(1) Tou = PooPia (4) Too = 1 Pia + Dow 
(2) te = PiaPaz (5) th 1 


Pio + Diw 
1 Deo + Dar 


These again are easily solved. Turner and Stevens discuss the 
effects of errors of measurement but not by means of path analysis, 
which if attempted with concrete coefficients is encumbered with 
symbols for variances. Thus equation (1) becomes 


(3) To = PooPa2 (6) Toa 


2; 2 2 
bor = OF COVo: = 


There is, of course, indeterminancy if both (b) and (c) are assumed. 
KEneumbrance with unnecessary variances occurs wherever two 
variables trace to a third. The simplest case is that shown in Figure 6. 
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Va 


FIGURE 6. 
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Standardized | Conerete 
Coeflicients Coeflicients 
(1) ns = bis = ts 
(2) re = P23 bos = C23 
(3) rie = pispes be = 02 
(4)m =1= Dis + Diu a} = C1303 + (Ciuon) 
(5) re = 1 = pis + P2v = + 


There is overdetermination of two of the paths. Again a compromise 
solution may be obtained if there is confidence in the diagram. If not, 
the latter may be revised to indicate a correlation, r,, , between the 
residuals or to indicate that there are errors of measurement in the 
intermediary V;. A solution can be obtained under either of the latter 
two hypotheses (but not both at once). 

A common situation is represented in Figure 7. 


° 
FIGURE 7. 
Concrete 
Standardized Coefficients Coefficients 
(1) roa = por + Poors boa = co + Cobar 
(2) re = poiri2 + Poe bo = cobie + Co 
(3) m2 given ba and bye given 
(4) roo = 1 = para + poeror + Pow 


The coefficients pertaining to all of the paths are readily calculated. 
This is a simple example of the patterns characteristic of multiple 


| 
2 


regression which are always easily solvable since only linear equations © 


ure involved. The equations (excluding 4) can be written as simply in © 


terms of the regressions as in terms of the standardized coefficients but 
the former introduce an arbitrary asymmetry which is undesirable. 
Where only symmetrical patterns of this sort are being dealt with, the 


simplest procedure is, no doubt, to calculate the concrete coefficients © 
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lirectly from the ordinary normal equations of the method of least 
squares. ‘This, however, takes us away from the sort of interpretive 
analysis which we are here considering, in which such symmetrical 
yatterns are merely special cases. 

Figure 8 is an example of another sort of symmetrical case. 


| Vv 


Ve 
> 
| Vy 
r 

FIGURE 8. 


te = PicP20 » (4) =1l= Dio + Pin 
(2) m3 = PicP3o » (5) tea =1= + 
(3) T23 = » (6) m3 = l= Do + 


There are six paths and six equations which permit solution for the 
standardized coefficients. Concrete coefficients can not be used in this 
case because of ignorance of the variance of the hypothetical general 
factor V,. 

This is a simple example of the conventional pattern of factor 
analysis with one general factor as proposed by Spearman [1904]. 
There is overdeterminancy if there are more known variables. <A 
solution that maximizes the sum of the squared path coefficients that 
relate the known variables to the general factor has been given by 
Hotelling [1936]. Additional general factors may be postulated. The 
number of equations that can be written, given m known variables is 
(1/2)m(m + 1). The number of coefficients to be determined with n 
common factors and m residuals is m(n + 1). There may be exact 
determinancy, under or overdeterminancy. Hotelling has shown that 


ted. | with any number of known variables there is complete determinancy 
iple | by the same number of factors if the sums of the squared path co- 
ions _ efficients relating to the factors are successively maximized. Other 
y in _ conventions for arriving at a unique solution have been given. 

but 


The first paper on path coefficients (as square roots of coefficients of 


able.» direct determination) (Wright [1918]) dealt with material of the sort 
, the | 


ients 


to which factor analysis is applied (all possible correlations in a set of 
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bone measurements ina rabbit population), but from a less symmetrical 
Viewpoint. ‘This has been developed in later papers (Wright |1932, 
1954]). 

SUMMARY 


The method of path coeflicients and its more important pitfalls are 
reviewed briefly with reference to recent misunderstandings. 

Reasons are discussed for looking upon standardized coeflicients 
(correlations, path coeflicients) and concrete ones (total and path re- 
gressions) as aspects of a single theory rather than as alternatives 
between which a choice should be made. They correspond to different 
modes of interpretation which taken together give a deeper under- 
standing of a situation than either can give by itself. 

It is brought out that even where the sole objectives of analysis are 
the concrete coefficients, actual path analysis takes a simpler and more 
homogeneous form in terms of the standardized ones, which can easily 


be converted into the concrete forms as the final step. 
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ESTIMATING THE PARAMETER IN A CONDITIONAL 
POISSON DISTRIBUTION* 


A. CoHEN, JR. 
The University of Georgia 
Athens, Georgia, U.S.A. 


1. INTRODUCTION 


The problem of estimating the Poisson parameter when zero values 
of the random variable may not be observed has recently been the 
subject of considerable attention. Samples from distributions of the 
number of persons per residence suffering from a contagious disease 
and of the number of accidents per worker in a factory during specified 
intervals of time are of this type. Such samples have been considered 
by David and Johnson [3], Hartley [5], Moore [8], Plackett [9], Rider 
10], Tate and Goen [11], and, while this paper was being processed for 
publication, by Irwin [6]. They constitute a special case of the various 
types of restricted Poisson samples studied earlier by this author [2]. 
In the author’s previous paper, maximum likelihood estimating equa- 
tions were derived for truncated and censored samples in both singly 
and doubly restricted cases. However, tables necessary for the rapid 
easy solution of these equations were not provided at that time. 

David and Johnson, Rider, and Moore seemed primarily interested 
in estimates that are easy to calculate. When necessary, they appeared 
willing to make slight sacrifices in efficiency for the sake of easier 


calculations. Plackett, and Tate and Goen were further concerned 


with unbiased estimators. Maximum likelihood estimators were 
dismissed as being unsuited for ordinary practical use because of bias 


_ and the burdensome calculations involved. 


Admittedly, maximum. likelihood estimates are troublesome to 


_ calculate without proper tables since it is necessary to solve a some- 
_ what complicated non-linear equation. These estimates, however, 


are consistent and asymptotically efficient. Except in small samples, 


_ bias appears unlikely to be a major source of trouble. Therefore when 


samples are large, if the labor involved in calculating maximum likeli- 


_ hood estimates can be sufficiently reduced, there appears to be little 


_ justification for employing any other estimator. Furthermore, one 


_ might quite appropriately question the advisability of even attempting 


*Sponsored by the Office of Ordnance Research, U. S. Army. 
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to estimate the Poisson parameter from very sinall samples since 
sampling errors inherent in these estimates are of such magnitude as 
to limit the usefulness of results obtained regardless of the estimator 
employed. 

The reason for reexamining the problem under cousideration here 
is to provide tables and charts necessary to render calculation of maxi- 
mum likelihood estimates feasible and to emphasize the requirement 
for large samples. Abbreviated tables with entries to three decimals 
were given by David and Johnson [3] and by Rider [10], but they are 
hardly adequate for general use. 

Irwin [6] gave an explicit solution for the maximum likelihood 
estimating equation in the form of a Lagrange series, but convergence 
is slow, particularly for small values of @ Because of the number of 
terms of the exapansion which therefore must be evaluated, calculations 
using his results are still troublesome. The tables and charts presented 
here greatly reduce the labor of computing maximum likelihood esti- 
mates in practical applications. 

Hartley [5] gave a general iterative procedure for obtaining maximuny 
likelihood estimates of parameters of any type population. Iowever, 
calculations using his method are rather laborious iin comparison with 
linear interpolation in the tables provided here, a procedure which is 
adequate in the case under consideration. Unless one is fortunate 
enough to begin with a close first approximation, calculations using 
Hartley’s method are likely to be tedious and time consuming. 


2. MAXIMUM LIKELIHOOD ESTIMATION 


The conditional Poisson probability funetion under consideration 
here may be written as 


pz) =e (1) 


The likelihood function for a sample consisting of n observations of 
random variable z, having probability function (1), may be written as 


Taking logarithms of (2) and differentiating, we obtain 
aL/ax = —n/(l —e*) + (3) 


where L has been written for InP. On setting (3) equal to zero, we 
obtain the estimating equation 


A/(1 — = &, (4) 
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where @ is the sample mean (¢ = )°7_, x,/n). The required estimate, 
which we designate \ as distinguished from \, the parameter being 
estimated, must then be a positive real root of (4). Incidently, equa- 
tion (4) above is a special case of equation (10) of reference [2]. 

In order to simplify the solution of (4) for given values of the sample 
mean £ Table 1 has been prepared with the aid of Molina’s Tables of 


TABLE: 1 # = A/(1 — e-A) 


1.155 0.2955 | 1.46 0.8115 | 2.30 1.9836 | 4.30 4.2379 | 7.6 7.5962 
1.0005 0.0010 | 1.160 .3046 | 1.47 .8273 | 2.35 2.0464 | 4.35 4.2904 | 7.7 7.6965 
1.0010 .0020 | 1.165 3137 | 1.48 .8430 | 2.40 2.1086 | 4.40 4.3428 | 7.8 7.7968 
1.0015  .0030 | 1.170 3227 | 1.49 .8586 | 2.45 2.1703 | 4.45 4.3951 7.9 7.8971 
1.0020. .0040 | 1.175 = .3317 | 1.50 .8742 | 2.50 2.2316 | 4.50 4.4473 | 8.0 7.9973 
1.0025 0.0050 | 1.180 0.3407 | 1.51 0.8897 | 2.55 2.2924] 4.55 4.4994] 8.1 8.0976 
1.0030 .0060 | 1.185 3497 | 1.52 .9052 | 2.60 2.3527 | 4.60 4.5515 | 8.2 8.1977 
1.0035 .0070 | 1.190 .3586 | 1.53 .9207 | 2.65 2.4126 | 4.65 4.6034] 8.3 8.2979 
1.0040 .0080 | 1.195 .3675 | 1.54 .9361 | 2.70 2.4721 | 4.70 4.6553 | 8.4 8.3981 
1.0045 0090 | 1.200 3764] 1.55 .9514 |) 2.75 2.5312 | 4.75 4.7071 8.5 8.4983 
1.005 0.0100 | 1.205 0.3853 | 1.56 0.9667 | 2.80 2.5899 | 4.80 4.7588 | %&6 8.5984 
1.010 0199 | 1.210 .3942] 1.57 .9819 | 2.85 2.6483 | 4.85 4.8105] 8.7 8.6986 
1.015 -0299 | 1.215 4030 | 1.58 .9970 | 2.90 2.7063 | 4.90 4.8622 | 8.8 8.7987 
1.020 -0397 | 1.220 4118 | 1.59 1.0121 | 2.95 2.7640 | 4.95 4.9137] 8.9 8.8988 
1.025 .0496 | 1.225 4206 | 1.60 1.0272 | 3.00 2.8214 | 5.00 4.9652 | 9.0 8.9989 
1.030 0.0594 | 1.230 0.4294] 1.61 1.0422 | 3.05 2.8786 | 5.1 5.0679} 9.1 9.0990 
1.035 -0692 | 1.235 .4381 | 1.62 1.0571 | 3.10 2.9354] 5.2 5.1704] 9.2 9.1991 
1.040 -0790 | 1.240 4468 | 1.63 1.0720 | 3.15 2.9919 | 5.3 5.2728 | 9.3 9.2992 
1.045 -0887 | 1.245 .4555 | 1.64 1.0869 | 3.20 3.0482 | 5.4 5.3750] 9.4 9.3992 
1.050 -0984 | 1.250 .4642 | 1.65 1.1017 | 3.25 3.1042] 5.5 5.4770] 9.5 9.4993 
1.055 0.1081 | 1.26 0.4815 | 1.66 1.1165 | 3.30 3.1600 | 5.6 5.5789 | 9.6 9.5993 
1.060 1177 | 1.27 -4987 | 1.67 1.1312 | 3.35 3.2156 | 5.7 5.6806 | 9.7 9.6994 
1.065 1273 | 1.28 .5158 | 1.68 1.1458 | 3.40 3.2709 | 5.8 5.7821 | 9.8 9.7995 
1.070 1369 | 1.29 .5329 | 1.69 1.1604 |] 3.45 3.3259] 5.9 5.8836] 9.9 9.8995 
1.075 1464 | 1.30 -5499 | 1.70 1.1750 | 3.50 3.3808 | 6.0 5.9849] 10.0 9.9995 
1.080 0.1559 | 1.31 0.5668 | 1.71 1.1895 | 3.55 3.4356] 6.1 6.0871 | 10.1 10.0996 
1.085 1654 | 1.32 .5836 | 1.72 1.2040 | 3.60 3.4902] 6.2 6.1873 | 10.2 10.1996 
1.090 1749 | 1.33 -6003 | 1.73 1.2185 | 3.65 3.5446] 6.3 6.2: 10.3 10.2997 
1.095 -1843 | 1.34 -6170 | 1.74 1.2329 | 3.70 3.5988 | 6.4 6.3893 | 10.4 10.3997 
1.100 1937 | 1.35 -6335 | 1.75 1.2472 | 3.75 3.6528 | 6.5 6.4902 | 10.5 10.4997 
1.105 0.2031] 1.36 0.6500] 1.80 1.3184 | 3.80 3.7067 | 6.6 6.5910 | 10.6 10.5997 
1.110 2125 | 1.37 -6665 | 1.85 1.3885 | 3.85 3.7604 | 6.7 6.6917 | 10.7 10.6998 
1.115 -2218 | 1.38 .6829 | 1.90 1.4578 | 3.90 3.8140 | 6.8 6.7924] 10.8 10.7998 
1.120 -2311 | 1.39 .6992 | 1.95 1.5261 | 3.95 3.8674 | 6.9 6.8930 | 10.9 10.8998 
1.125 2404 | 1.40 -7154 | 2.00 1.5936 | 4.00 3.9207 | 7.0 6.9936 | 11.0 10.9998 
1.130 0.2496] 1.41 0.7316 | 2.05 1.6603 | 4.05 3.9739 | 7.1 7.0942 | 11.2 11.1998 
1.125 -2588 | 1.42 -7477 | 2.10 1.7263 | 4.10 4.0269 | 7.2 7.1946 | 11.3 11.2999 
1.140 2680 | 1.43 .7637 | 2.15 1.7916 | 4.15 4.0798 | 7.3 7.2951 | 11.5 11.4999 
1.145 2772 | 1.44 -7797 | 2.20 1.8562 | 4.20 4.1326] 7.4 7.3955 | 12.0 11.9999 
1.150 2864 | 1.45 -7956 | 2.25 1.9202 | 4.25 4.1853 | 7.5 7.4958 | 12.5 12.5000 


att 
4 

ie: 

4 

~ 
) | 
| 
4 
is 


206 BIOMETRICS, JUNE: 1960 


the Poisson Function [4] and the W.P.A. Tables of the Exponential 
Function [12]. It is necessary only that we enter our table with the 
sample value # and read } directly. Linear interpolation will ordi- 
narily yield an accuracy of four -(at least three) significant digits in 
this value. 


For use when a quick solution of the estimating equation is desired 


2009 4.09 4 .9 i> INSTRUCTIONS 
7.08 7 +—\— I. Locate sample value of x on 
: 2. Project x vertically to tater 
2006 4.06 sect appropriate graph segment, 
—+- 3. Project Intersection of (2) BS 
4.05 above horizontally to corresponding = 
wertical scele and read § 
Note: Scales ltebeled "A" apply to 
2003 .3 greph sequent TA", 
20009 7.00974 .09 
0008 4.008 | 
20007 4,007 4 .07 
20006 4.0064 .06 
.0005 4,005} 
20004 4.0044 .04 
t 
0003 4.003] .037— 
Example: Given x = 3.034, determine Aw 


+++ Locete point corresponding to this valve on 

zt horizontal "A" scele, project vertically to 

segment of graph, project horizontally to 

vertical "A" scale and read § = 0.175. 

Compute = 3,034 - 0,175 = 2,859, value 

which differs by only one unit tn the third 
decimal! from value obtained by linear tnter- 


20002 4.002; .024 


: polation tn Table I. 
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and when a slight sacrifice in accuracy is permissible, a folded scale 


chart of 6 as a function of is given in Figure 1, where 6 


Thus with € given, 6 can be read from this chart and the required 
estimate follows as } = < — 6. By plotting 6 rather than , it has 
been possible to achieve a higher degree of accuracy in a chart of fixed 


size. 
3. VARIANCE OF THE ESTIMATE 
The asymptotic variance of } may be expressed as 
V(X) = 


The second partial derivative follows from (3) as 


Therefore 
Va) ~ YVAD/n], 
where 
vr) = (1 — De). 


We note that Y(A) is continuous and monotonic decreasing, that 
limg.o W(A) = 2, and lims.. ¥(A) = 1. Therefore, regardless of the 


value of A, the asymptotic variance satisfies the inequality 
A/n < < 2A/n. 


TABLE 2 
THE VARIANCE FUNCTION 
WA) = (1 — — (A + 
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= X. 


(5) 


(6) 


(7) 


(8) 


(9) 


| 

WA) Wr) | 
O01 2.000000 1.0 1.512159 7 
wl 1.935503 1.5 1.364906 8 
.2 1.875156 2.0 1.258674 9 
3 1.818676 2.5 1.182216 10 
4 1.765808 3.0 1.127426 14.5 
5 1.716315 3.5 1.088421 
6 1.669964 4.0 1.060855 
7 1.626561 4.5 1.041538 
8 1.585911 5 1.028135 
9 1.547833 6 1.012619 * 


1.002018 
1.000988 
1.000409 
1.000007 
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= 


1-(A+ 1)e™ 


~ 


FIGURE 2. 
GRaru OF VARIANCE FUNCTION 


: To facilitate calculation of variances of estimates, Table 2, which is 
- an abbreviated table of (A), has been included. In addition, a graph 
of this function is given as Figure 2. 


&g 4, ILLUSTRATIVE EXAMPLES 


To illustrate the practical application of these results, Bortkiewicz’s it 
[1] classic example on deaths from the kick of a horse in the Prussian | tw 
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Army has been selected. These deaths were collected from records of 
a certain group of ten Prussian Army Corps over the twenty year 
period from 1875-1894. A total of 122 deaths were recorded in the 200 
annual reports included in the study, and the mean number of deaths 
per army corps per year is 122/200 = 0.610. On the basis of the full 
(unrestricted) sample, this is the estimate of \. Following is a tabulation 
of the complete data for this example. 


No. deaths per Army 
Corps per year No. observations 

z f 
0 109 
1 65 
2 22 

3 
4 1 
5 0 


For the purposes of this illustration, zero observations are eliminated 
and there remains a sample consisting of n = 91 observations with 


= 122/91 = 1.3407, 


which we consider as being from a conditional Poisson distribution 
with probability function (1). Perhaps the sample thus formed might 
more appropriately be considered simply as a truncated Poisson sample 
from which the zero class is missing. Regardless of the point of view 
adopted, estimating equation (4) is applicable. 

Entering Table 1 with ¢ = 1.3407, we interpolate linearly as summar- 
ized below and round off to three decimals to obtain } = 0.618, which 
is to be compared with the complete sample estimate of 0.610. 


z 
1.3500 0.6335 
1.3407 0.6182 
1.3400 0.6170 


The chart of Figure 1 can be employed to read § = 0.72, from which 


it follows that } = 1.34 — 0.72 = 0.62, a value that is correct to the 
two decimals given. 
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We employ (7) with A replaced by its estimate 0.618 in calculating 
the variance of 4. From the chart of Figure 2, we read ¥(0.618) = 1.66 
and accordingly 


T(A) = 1.66[0.618/91] = 0.0113, 
ox = VV) = 0.106. 


For comparison, we compute the standard deviation of \ based 
on the full unrestricted sample of 200 observations, and thus obtain 


= V0.610/200 = 0.055, 


a value which represents a reduction of almost one-half from the 
standard deviation of the estimate based only on the 91 non-zero 
observations. The comparison afforded here serves to emphasize the 
necessity for large samples if reliable estimates of the Poisson parameter 
are to be obtained when the zero class is missing. 

For an additional illustration, we consider the distribution of eggs 


laid in the unopened flower heads of the black knapweed by the Knap- - 


weed gall-fly in two different years, 1935 and 1936, studied by Finney 
and Varley [4]. The number of flower heads in which no eggs were laid 
is unavailable, and the 1935 data consisted of 148 observations for which 
the number of eggs is one or more with € = 3.020. Linear interpolation 
in Table 1 immediately yields } = 2.8443 as compared with 2.845 
given by Finney and Varley. More accurate calculations using entries 
from Tables of the Exponential Function [12] subsequently verified 
the interpolated value given here to be correct. The chart of Figure 1 
can be employed to read 6 = 0.18, from which it follows that } = 3.02 — 
0.18 = 2.84, a result correct to two decimals. 

The 1936 data consisted of 88 flower heads with ¢ = 3.034. Linear 
interpolation in Table 1 gives } = 2.8603 which, when rounded off to 
three decimals, agrees exactly with the corresponding estimate calcu- 
lated by Finney and Varley. Again the chart of ligure 1 can be 
employed to read 6 = 0.175, from which it follows that } = 3.034 — 
0.175 = 2.859, a result correct to within one unit in the third decimal. 
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MODELS FOR THE INTERPRETATION OF EXPERIMENTS 
USING TRACER COMPOUNDS' 


JEROME CoRNFIELD,? JESSE STEINFELD,*? AND SAMUEL W. GREENHOUSE 


National Institutes of Health 
Bethesda, Maryland, U.S.A. 


1. Introduction 


The early experiments of Schoenheimer and his colleagues [1] using 
isotopically labelled compounds led to the conclusion that many 
body constituents, despite an apparent constancy, were in a state of 
rapid flux. Thus, labelled hydrogen atoms were found to interchange 
quite rapidly among different fatty acids which, in view of these results, 
could no longer be regarded as metabolically inert. Such results made 
it natural to inquire about the various states in which any particular 
substance could exist and in particular about the total amount to be 
found in each state, and the rate at which it moved back and forth 
among the states. It was hoped that a deeper understanding of many 
biological processes might result. 

It has been an interesting, and, for those engaged in the biological 
sciences, a somewhat novel experience to discover that the realization 
of this program requires the formulation of explicit quantitative models 
of the phenomena being studied. It is generally recognized that measure- 
ment always implies a theory which indicates what one measures and 
why one uses one way in preference to another. But in many areas of 
biology it has not been necessary to make this underlying theory 
explicit, the exceptional areas, such as genetics or biological assay, 
involving only a minority of biological workers. In experiments involv- 
ing the use of tracers the estimation procedures used depend, however, 
in a non-trivial way on this body of theory. It is the purpose of the 
present paper to (a) sketch the kind of theoretical model required for 
such estimation, paying particular attention to the biological assump- 
tions, (b) to illustrate the essential dependence of the value of an 
apparently simple and well-defined magnitude on the model assumed 


1A revision of a paper prepared for presentation at the 30th Session of the International Statistical 
Institute in Stockholm, August 8-15, 1957. 

2Present address, Department of Biostatistics, School of Hygiene and Public Health, The Johns 
Hopkins University. 
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and (c) to consider some statistical issues involved in the application 
of the models to experimental data. 


2. Some results with I'*" labelled albumin 


We show in Figures 1 and 2 the results of an experiment involving 
the intravenous injection into a single human subject of a dose of 
albumin, tagged with radioactive iodine. Figure 1 shows the blood 
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BLoop CoNCENTRATION ( X10‘) As A FRACTION OF ORIGINAL DosE 


concentration, in counts/min/ml blood as a percentage of the original 
dose at various times after injection. Figure 2 shows the percentage 
of injected dose excreted in urine each day. (The amount recovered in 
feces is negligible and has been disregarded. All counts are corrected 
for physical decay.) A fairly simple calculation is sufficient to indicate 
that the injected albumin has distributed itself to tissues other than 
the blood. First, the injected dose was 3.14 X 10’ counts/minute. 
The blood concentration 20 minutes after injection was 7431 counts/ 
minutes/ml blood. One might then say that the total volume into 
which the albumin had distributed itself by 20 minutes must be 3845 ml. 
(This calculation involves a 9 percent reduction in computed volume 
to allow for the difference between peripheral venous hematocrit and 
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FIGURE 2. 
PERCENT OF INJECTED Dose ExcRETED PER Day 


whole body hematocrit.) This volume, which we shall refer to as the 
volume of the vascular compartment, is the volume of body fluids 
which would hold albumin in solution at the same concentration as the 
blood [2]. If the blood concentration at any subsequent time is multi- 
| plied by 3845, we have an estimate of the total amount of labelled 
\. albumin in the vascular compartment at that time. But comparison 
with the amount of iodine label excreted indicates that not all of the 
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albumin remaining in the body is in the vascular compartment. Figure 
3 shows, for different times, the percentage that the amount in the 
vascular compartment constitutes of the total amount remaining in the 
body. It will be noted that by the fourth day less than half the body 
albumin is to be found in the vascular compartment and that the 
fraction seems to stabilize at a value of about 0.42. It seems reasonable 
to conclude, therefore, that when distribution is completed approxi- 
mately 42 pereent of the administered albumin is in the vascular 
compartment and 58 percent outside it, at least in this patient. More- 
over, if labelled and normal body albumin distribute themselves in the 
same way, and this is of course a postulate on which all work with tracer 
compounds is based, these results may be used to estimate the distri- 
bution of normal body albumin. 

If we denote by V the volume of the vascular compartment, by f 
the asymptotic value of the fraction of the labelled albumin in the 
vascular compartment and by x the blood concentration of unlabelled 
albumin, then an estimate [3] of the total amount of normal body 
albumin is given by 


total body albumin = V2/f. (2.1) 


‘This has been called the dilution equation [3a]. It is an interesting, 
and far from obvious fact, that the estimate (2.1) contains hidden 
assumptions which could lead to error, although in many cases of 
practical interest the error will not be large. An alternative estimate 
in common use [4, 5] takes advantage of the fact that the time course 
curve of blood concentration can be represented as a linear combination 
of two or more exponentials so that 


concentration = (2.2) 


lf A, stands for the intercept corresponding to the exponential with 
the smallest A, we may denote this estimate as 


total body albumin = Vx/[A,/>>4,). (2.3) 


This estimuic is not the same as that yielded by (2.1) and may in fact 
dilfer from it by a considerable amount. This disagreement suggests 
the importance of developing a model which sets down in an explicit 
and formal manner the assumptions being made. Only in this way is it 
possible to know what assumptions are embodied in any particular 
method of estimation. 
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3. Multé-compariment systems‘ 


We start by considering a system in which a substance is trans- 
ferred in reversible fashion between two compartments, but goes 
irreversibly from either of these to a third terminal compartment. We 
refer to this as a two-compartment scheme. That the system under 
study can be so described is the first assumption that must be made. 
Thus 


Assumption 1. Let 


¥,() = the total amount of the substance in compartment 7 
(¢ = 1, 2, 3) at time ?, 


P,,(t) = the gross amount going from compartment 7 to compartment 
j in unit time at time ¢ [i ~ 3, j = 1, 2, 3, ¢ ¥3, and P;,(t) = 0). 


(See Figure 4a.) In terms of the example of the previous section, Y,(¢) 
can be thought of as the amount of unlabelled albumin in the vascular 


(a) UNLABELLED 


y, (1) 
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ys(t) Pio (1) Po, (1) 
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(b) LABELLED 


yi (t) 


Pt) 
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FIGURE 4. 
ScHEMATA FOR TRANSFER OF UNLABELLED AND LABELLED SUBSTANCE 


4A complete bibliography on multi-compartment systems as applied to tracer experiments is 
beyond the scope of this paper. The paper by Gellhorn et al [6] is the earliest that we have been able 
to find, although the use of systems of simultaneous differential equations in chemical kinetics is much 
older. Of special interest to multi-compartment systems are references (7, 8, 9, 10, 11, and 12]. The 
last reference covers questions of much broader scope than tracer experiments. 
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compartment, Y,(t) the amount in the extravascular compartment, and 
Y,(t) the cumulative amount of a metabolized product containing the 
label which is excreted up to time t. Now introduce amount Z of a 
labelled version of the substance into compartment 1 and denote the 
amount of it in compartment 7 at time ¢ by y;(t). 

Assumption 2. The fraction of substance in compartment 7 going to 
compartment j in unit time is the same for labelled and unlabelled 
substance. This is the fundamental assumption of isotope work and 
is sufficient to determine the equations describing the movement of 
labelled substance. Thus by this assumption, the amount of label 
moving from compartment 1 to compartment 2 in unit time at time 1! 
is (see Figure 4b). 


(3.1) 


The net changes in amount of label per unit time for each compartment 
are thus 


Ay,(t) _ _ + 
Fuld Palo Pala) 2) 


n= y (0 | a 


Even after going to the limit these equations cannot be integrated unless 
something is known about the functional dependence of the P’s and 
Y’s on t. For what follows it is sufficient to make 

Assumption 3. The fractional amount of unlabelled substance going 
from compartment 7 to compartment j in unit time is independent of 
time. If we denote P;;/Y; by k,; and proceed to the limit we may then 
rewrite equations (3.2) as 


= —(ky2 + kis)y: + , 


d 
= ky: — (kar + kes) ye , (3.3) 


[ [kisti + Kosy2] dt, 


where the k,;; are constants independent of time. By analogy with 
formally identical quantities in chemical kinetics they are often referred 
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to as rate constants or rate parameters. It is also possible to think of 
the /,,dé as the transition probabilities of a stationary Markov process 
[13] with matrix 


— + dl dt dt] 
ky, dt 1 — (kay + has) dt (3.3.1) 
| 0 0 1 | 


It is curious that when considered from the point of view of equations 
(3.3) the process is sometimes referred to as deterministic, but when 
considered from the formally equivalent point of view of (3.3.1) as 
stochastic. 

The first two of equations (3.3) are simultaneous linear differential 
equations in two unknowns. Either unknown may be eliminated to 
vield the second-order linear differential equation with constant co- 
efficients [14] 
dy dy 
We + + his + + kes) dt + + hos) + = 0. (3.4) 
The solution of (3.4) is known to be 

+ ia, (3 5) 
= Ane" + 
where — A, and — X, are the two roots of the characteristic equation, 
+ (hig + hig + hor + + + Ko3) + Kyokog = 0 (3.5.1) 


and the .1’s are determined by the initial conditions, that at ¢ = 0, 
y, = Zand y, = 0, so that dy,(0)/dt = — (hy. + k,3)Z and dy,(0)/dt = 
k,.Z. Then, 


Ay, = + kis — — Az), 
Aj =Z—- Ay 


Az = —An. 

Krom the third of equations (3.3) 
ys = As(1 — e™') + — (3.7) 


where = + and As. = (kisAi + 22)/ 
From (3.5.1) it follows that A, and d, are single-valued functions of the 
and from (3.6.1) that the same is true of the A,;; .  l'urthermore 


i 
fs | 
& 
| 
t 
ji 
A : 
T 
8) 
| th 
» co 
co 
ro 
lin 
are 
Col 
4 


3.7) 


the 


EXPERIMENTS USING TRACER COMPOUNDS 219 


it is easy to verify that the inverse solution for the k;; yields single- 
valued functions of the \; and A,; . 

It is now necessary to express the constants of equations (3.5) and 
(3.7) in terms of experimentally measurable quantities. To do this, 
let the total amount of labelled substance in the terminal compartment 
at various times be determined. Then, aside from questions of experi- 
mental error and other probabilistic matters to be considered later 
it is sufficient to equate such values at time ¢ with y;(t). Thus, the 3 
independent constants upon which ys; depends, A, , Az and A;; , can be 
experimentally determined. But there are four rate constants defining 
the system so that measurements on amount excreted are not sufficient 
to completely specify the system. In many cases it is reasonable to 
assume that movement to the terminal compartment takes place from 
only the compartment to which labelled substance was introduced. 
This is equivalent to setting k,,; = 0, and, if this assumption is made, 
then measurements on the terminal compartment are sufficient to 
determine the three remaining constants. If it cannot be made, then 
measurements on at least one additional compartment are required. 
Denote the measured concentration in the first compartment at time 
t by c,(t). The volume of the vascular compartment V is then given by 
Z/c,(0). If we now make 
Assumption 4. The volume of the vascular compartment V is a constant 
independent of time, 
then, again aside from statistical questions, 


= (3.8) 


This yields one additional independent equation connecting the rate 
constants with observed data and completes the specification of the 
system. (Needless to say, specification of the rate constants governing 
the behavior of a second compartment is not equivalent to the physical 
identification of that compartment.) 

The extension to n compartments involves no new ideas. Equations 
(3.3) now involve n simultaneous linear equations, involving n’ rate 
constants. y;,(t) for all but the terminal compartment is now a linear 
combination of n exponential terms of the form e~*. The ,’s are the 
roots of the nth degree polynomial specified by the characteristic 
equation. y,,,(t), the time course for the terminal compartment, is a 
linear combination of n terms of the form (1 — e~**'). If measurements 
are made on the terminal compartment, this yields 2n — 1 independent 
constants, the values of which are of course insufficient to specify n” 
rate constants. If we now restrict the system, so as to specify that 
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(a) flow to the terminal compartment takes place only from one central 
compartment, and (b) that all the other compartments interchange 
reversibly with that one central compartment, but not with each other, 
we have a system containing 2n — 1 rate constants, and one which is 
completely determined by measurements on the terminal compartment 
alone. This is the so-called mammillary system [6]. If such restrictions 
cannot be made, then measurements on additioual compartments are 
required. Thus, if one retains assumption (b) but drops assumption 
(a), this yields a system containing 3n — 2 rate constants. If measure- 
ments are now made on the terminal compartment and one additional 
compartment, this also yields 3n — 2 independent constants and is 
sufficient to determine the system. 


4. Estimated compartment size 


The considerations of the previous section are sufficient, at least in 

principle, to determine the rate constants that, by assumptions 2 and 3, 
govern the interchange of both labelled and unlabelled substances 
among the compartments. But these constants are insufficient by 
themselves to determine the total amount of unlabelled substance in 
any compartment, i.e. compartment size. This is so because irre- 
versible flow of unlabelled substance to the terminal compartment will 
inevitably empty the system unless a way is specified for additional 
unlabelled substance to enter it. This leads to 
Assumption 5. With respect to unlabelled substance the body is in 
steady state and is producing a time independent amount of such 
substance in each compartment. 
The amount produced may be thought of as entering each compart- 
ment in a steady flow whose rate is determined directly or indirectly 
by food intake. If we denote K, the rate of entry of such substance 
in the 7th compartment, then equivalent to equations (3.3) for labelled 
substance the net movement of unlabelled substance into compartment 
lis 


K, — (ki2 + ks) + Y2 (4.1.a) 
and into compartment (2) 
K, + ki2Y, (key + kos) (4.1.b) 


Furthermore, by the steady state assumption both net movements are 
zero, so that (4.1.a) and (4.1.b) may be equated to zero and solved 
for Y, and Y,. The Y’s obtained are the steady state compartment 
sizes corresponding to the model assumed. If the determinant of the 
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two equations be denoted by D and if we denote by D; the determinant 
obtained by substituting for the 7th column of D the column vector 
(- K,), then 


Y¥,=D/D, t=1,2. (4.2) 


These results of course generalize immediately for n compartments. 
The Y, are thus dependent upon the rate constants, whose values can 
be determined from experiments with labelled substance, and upon 
steady state flow rates of unlabelled substance K; which must be 
determined in some other way. If it were possible to make measure- 
ments of the total amounts of unlabelled substance in each of the two 
compartments, then from (4.1) and the steady state assumption we 
should have immediately 


K, 
K, 


+ ki) Y, ka Yo 
+ (key + kos) 


(4.3) 


ll 


This would yield the steady state flows. In most problems, however, 
one does not have access to both compartments and cannot measure 
Y, and Y, directly. For the case of two compartments it is sometimes 
possible to measure these two quantities indirectly. Thus, if one is 
dealing with an unlabelled substance which is excreted in its original 
form, say total body nitrogen [15], total amount excreted in urine and 
feces can be directly measured and yields an estimate of K, + K,. 
Similarly, if 2 denotes concentration of unlabelled substance in blood, 
Vx provides an estimate of Y, . For unlabelled substances excreted 
in metabolized form, e.g. albumin, this procedure cannot be employed 
since there are no excretion measurements that can be equated to 
K, and K,. For three or more compartments it cannot be employed 
even for simple substances like nitrogen. It is possible in general, 
however, to derive limits within which the Y; must fall, no matter 
what the K;. In many problems these limits are quite close to each 
other. Thus, considering the case of two compartments’ in detail, 
suppose Y, is known but that no information is available about K, 
and K, other than that they are non-negative. Then, denoting Y;/Y, 
by U, we have from (4.3). 


(4.4) 


‘For this case Campbell et al [16] have derived an upper but not a lower limit. 
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Since VY, + VY, gives total body albumin (TBA) and is equal to 
+ ), we have 


(ky. + hay + (Kio + hoy + Keys) 


It is of interest to compare the estimates (2.1) and (2.3) with the 
limits (4.5). We shall show that the former must always lie within 
these limits, but that the latter need not. Considering the former 
estimate first we note f = lim... [y:/(y: + y2)] and Vx = Y, and, if 
we denote lim,... [y./y,] by u, then the estimate of total body albumin 
yielded by (2.1) is Y,(1 + u). To show that this lies within the limits 
(4.5) it is sufficient to show that analogously with (4.4) 


+ 
< < j 
key + hos 6) 


As a preliminary we verify the fact that the present model leads to 
a distribution curve with an asymptote, like that of igure 3. We 
have from the definition 


From (3.5) and (3.7) we may write 
Ave" +A | 
=] | 4.8 
/ in + iM 


It ix easy to see that, if 4, denotes the smaller of the two exponents, 

f = (4.9) 
thus establishing the existence of an asymptote. To evaluate this limit 
in terms of the /;; we recall that 


i= 1,2. (4.11) 


We then have from (3.3) 
= (kyo + — ken, (4.12) 
and 
= —ky + (kor + os)u. 


Since A, , u > O, (4.6) follows immediately. 
Turning to the estimate (2.3) and denoting A,,/. i, by u*, total 
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body albumin as estimated by this procedure is Y, (1 + u*). We shall 
show that u* need not lie between the limits (4.4). 
From (3.6) we have 


u* = (A. — — his)/(ki2 + kis — (4.13) 
Furthermore, since —\, and —), are the roots of the quadratic (3.5.1) 
Ai + = (hi2 + kis + + hoa). (4.14) 

Substituting (4.14) thus in the denominator of (4.13) we have 
u* = + kis — + hes — (4.15) 


We first consider infringement of the upper limit of (4.4). Now 
from (4.12), it follows that A, < (kz. + k,3) and therefore the numerator 
of u* is positive. Since the upper limit of (4.4) is positive, the inequality 
will be infringed only if the denominator of u* is also positive, i.e. 
li < (ke: + 3). The necessary condition that the upper limit be 
infringed; namely, that u* > (k,. + k,3)/k., is then, from (4.15), that 
Ai > (Kia + Kis) + his — and (kis + kis — > 0. The 
sufficiency follows easily since the steps in the above proof are reversible. 
If (hig + kis — ke) < 0, the upper limit cannot be infringed since 
infringement then requires 4, < 0, which is impossible. 

Similarly, it can be shown that a necessary and sufficient condition 
for u* to infringe the lower limit is 


A > + + kos ky2) and ka + Kos > 0. 


The two conditions (k,. + k,; — ka.) > 0 and (ka, + ke — ki) > O 
can be written as upper limit > 1, lower limit < 1. 

Since A, > 0 (with the equality holding only when k,; = k,3 = 0), 
the upper limit will be infringed, whenever k.; = 0 and k,; ¥ 0 
and the lower limit, whenever k,; = 0 and k,, ~ 0. It is often 
argued on physiological grounds that one or the other of these 
constants is zero, but not both, (see for example reference [16a], 
so that the estimate (2.3) may be an unsatisfactory one from the point 
of view of the present model, under conditions that are not pathological. 
It is easy to show, however, that, when both k,3 , k:3 = 0, u* = k,2/ka , 
which is what the upper and lower limits (4.4) reduce to. [Substitute 
the first of equations (4.12) in (4.15), set k,; = kes = 0, and let u assume 
the value of k,./k,, .| From this result one can conclude that when 
metabolism is slow relative to transfer (k,; and k.,; small relative to 
k,, and k.,) that the estimate (2.3) will give results that are not grossly 
in error. 
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5. Some statistical considerations 

In the preceding sections our point of view has been entirely de- 
terministic. Were this point of view sufficient to describe the phenomena 


under study, then all observed points should fall exactly on computed 
curves. -A\s Figures 1, 2 and 3 show, however, this is far from the ease. 


? To complete the model, therefore, it is necessary to introduce probabil- 
4 istic considerations. Our remarks on this aspect of the problem are 
a intended only to provide a catalogue of problems, however. In par- 


ticular, we shall not give results intended for practical application. 
The most obvious, and indeed the classical, way of introducing 
statistical considerations involves assuming that each observed point 
consists of two components, a “true” component, whose behavior is 
described by the mathemtical theory, and another which is super- 
, imposed on the true one. This second component can be thought of 
as consisting of errors of measurement with the usual characteristics 
of independence and zero expectation. This model leads directly to 
the method of least squares as a way of estimating values of the rate 
constants from experimental observations. The least squares compu- 
tation, familiar though it is, has several unusual features when applied 
to linear combinations of exponentials. We shall treat some of these 
first, and only afterward shall we consider the adequacy of the simple 
probabilistic model upon which it is based. 

From the computer’s point of view, by far the most ususual aspect 
is the frequent failure of the usual iterative computation schemes to 
converge. To make this statement precise, consider the problem of 
finding values A, , A, , 4, , \, for which 


n 
S = wy, — Aye"! — Aye") (5.1) 
s=1 
is @ minimum and the y, are n independent observations with variances 
1/w; . The normal equations for this system, obtained by setting the 
appropriate partial derivatives equal to zero, are: 


we™'(y, — Ae — Ae’) =0, 


— Aye" — Ae") = 0, 
(5.2) 
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These equations are non-linear in \, and }, .. A common scheme for 
solving the system involves selecting initial values, say \{ and XJ , using 
the approximation 


Ae™ = Ae*''(1 — # Ad) (5.3) 


and substituting this for the expressions in parentheses in equation 
(5.2) to obtain four simultaneous equations linear in A, , A, , A; AA, 
and A, Ad,. The solution of these equations yields a new set of initial 
values, say Aj’, Aj’. The scheme involves proceeding iteratively in this 
fashion until the A d are sufficiently close to zero. A more precise state- 
ment of the computational difficulty then is that there are initial values 
dj and Aj for which (a) the A’s do not converge to zero or (b) A’s con- 
verge to zero but the resulting values of A, , A, , A, and A, are not those 
which minimize the sum of squares (5.1). Furthermore, initial values 
for which convergence does not occur need not be pathological but may 
be values which any computer would regard as quite reasonable. 

This behavior is a consequence of an unusual feature of the sum of 
squares surface (5.1), namely, that it possesses not one but two minima. 
This can be demonstrated quite easily. Suppose that a minimum sum 
of squares is given when 


A, A, A, A, ’ = Ay = (5.4) 


Then, if we substitute these values in equations (5.2), we have by 
hypothesis 


we (y; A.e**"’) = 0, 
(5.5) 


t=1 


> w,te"(y; Ave" = Ag" = 0. 
t=1 


Now consider another set of estimates as follows: 


A, = A, A, = A, ’ A, = = (5.6) 
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a If the partial derivatives of (5.1) are evaluated using these estimates, 
bs we obtain 


= 

i | 


Ave Rats Ave = 0, 


ily Yi Ase )=0, 


t=1 


— Ag — = 0. 

t=1 
By interchanging the first and second and third and fourth of equations 
(5.7) we obtain equations (5.5). Thus, if the estimates (5.4) give a 
minimum sum of squares, so do (5.6). 

: At first sight the existence of a double minimum seems trivial, since 
ya it seems merely to reflect the fact that we are free to call either exponent 
7 J the first one. But the existence of two absolute minima implies the 

existence of a stationary point in the sum of squares surface at some 

intermediate position, and all partial derivatives are zero at this point 
as well as at the two minima. Furthermore, if the stationary point is 
a saddle point, the determinant of the system of linear simultaneous 
equations obtained by substituting (5.3) in (5.2) will be negative at 
this point [16b]. Since this determinant is positive at both the absolute 
minima, and is a continuous function of the two exponents, there are 
at least two points at which the determinant will have value zero. 
Initial values of the A selected in the neighborhood of these points lead 
to ill-determined systems of equations, while initial values selected 
near the saddle point lead to solutions which are not minima. 

We now show that there exists at least one stationary point which 
. is not an absolute minimum. This will be done for two exponentials 
[ but the extension is obvious. Also, for convenience, the notation will 
" differ somewhat from the previous. 
Let the sum of squares surface be 


z= (y — ae“ — be*')’, (5.7.1) 


where u, v, a and b are the parameters to be estimated such that z is a 
minimum. Without any loss of generality let a and b be fixed. Assume 
the absolute minimum exists at (u, , 7,) and (vy, , u,), um #~v,. The 
first- and second-order partial derivatives are easily found to be: 
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= 2a > te“'(y — ae — be"), (5.7.2) 
z, = 2b > — ae“ — be"), (5.7.3) 
= —2a >, — 2ae™' — be"), (5.7.4) 
Zu = 2ab (5.7.5) 

= —2b — — 2be"’). (5.7.6) 


Consider now fitting the same set of observations, (y; , t;), with a 
single exponential of the specific form, (a + b) e“’. The sum of squares 
surface is then 


w= > (5.7.7) 


Clearly, in the set of all single exponentials resulting from the con- 
tinuous variation of u over its possible values there exists one which 
gives a better fit than all others. That is, there is a wu = wo , say, for 
which w(uo) < w(u;), u; # uo. By including the equality, we allow 
for the possibility that the minimum value of w may occur at more 
than one value of u. It then follows that 


dw 


=2Aatb) (5.7.8) 


and 
= Ve} >0. 6.7.9) 
In (5.7.1) let uw = v = u. Then from (5.7.2) and (5.7.3) 
Z = 2a D, te“'[y — (a + De], (5.7.10) 
2, leu = 2b [y — (a + De]. (5.7.11) 


But because of (5.7.8) both these derivatives must be zero at u = Up. 
We thus have established the existence of at least one stationary value 
of (5.7.1) at (wo , Ue) which occurs in addition to those at (u, , v,) and 

We now inquire into the conditions under which the stationary 
point occurring at (wp , Uo) is a saddle point. 

Let v = wu in (5.7.4), (5.7.5) and (5.7.6). 
Then 


Zuu = —2a >, — (2a + De], (5.7.12) 
Zu» = 2ab >> (5.7.13) 
= —2b D> ly — (a + (5.7.14) 
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The determinant of the second-order partial derivatives, each evaluated 


D = — (2..) = {> te — (at 
{>> [y — 2a + Wc (5.7.15) 


Because of (5.7.9), the second factor in the right-hand side must be 
negative. For the stationary point (wo , uo) to be a saddle point, it 
is therefore necessary for the first factor to be positive. 

Its sign will depend upon the individual y, or rather, upon the 
deviations of the y from the least squares fit. If we denote 


fi=e™" ly, — (a+ 


the first factor can be written as (#f. We now present 2 necessary 
and sufficient condition that }> (f be positive. (We omit the index 
of summation which of course is taken over all times of observation.) 
Divide all the f into two subsets, the first. containing all negative f, 
denoted by f, , and the second containing all positive f, denoted by f,. 
We define the following measures: F; = >> f; < 0, F. = > f. > 0 
T, = >> f,t/F, > 0, the mean time of all negative f’s; 7’. = > f.t/F2 > 0, 
the mean time of all positive f’s; ¢7 = >> f(t — 7,)°/F,, the variance 
of the times associated with the negative f’s; and 63 = >. fo(t — T.)?/F,, 
the variance of the times associated with the positive f’s. Since Fo; = 
> ft? — T?F;,j = 1,2, wehave + > f. = >of? = F,T? + 
+ + . But from the least squares condition (5.7.8) 
= fit+ > = + T.F. = 0 and hence F, = 
—(T./T,)F,. Thus f€ = — + — 
Clearly a necessary and sufficient condition that >> ft? > 0 is that 


This in effect says that the variance of the times associated with the 
positive deviations from the least squares exponential must be bounded 
from below by a certain amount. It is also clear that a weaker sufficient 
condition on ¢3 is possible if one adds the condition that 7, > 7, . 

The condition given for the sign of the first factor in (5.7.15) to be 
positive is not unreasonable. In fact, for data that are concave upward, 
like those of Figure 1, it will probably always hold, since the times of 
negative deviations will tend to cluster in the center whereas the times 
of positive deviations will tend to occur at the extremes, thus yielding 
« greater variance in the positive times. It appears then for many 
sets of data that the stationary point will be a saddle point and. the 
determinants of the system of linear simultaneous equations will be 
zero for at least two points in the (A, , Az)-plane. 
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There are various ways of avoiding the difficulties introduced by 
the existence of the stationary point. One involves computing S in 
equation (5.1) for a sufficiently large number of combinations of \, 
and A, , so that the minimum point is physically identified. Another 
involves taking as an entering approximation only (A, , A.)-combinations 
known not to lie between the two points leading to minima. (This 
can be done, for example, by setting either \ = 0.) Either procedure 
involves a good deal of calculation, and may require access to high- 
speed computing equipment. 

Two methods of reducing the burden of calculation that have been 
proposed make use of the fact that the differential equation leading to 
linear combinations of exponentials, e.g. equation (3.4), has constant 
coefficients. Prony’s method, as presented by Whittaker and Robinson 
[17], involves substituting linear difference equations in y; for the 
differential equations and using the method of least squares to solve 
for the coefficients. If errors on the individual y, are in fact independent, 
however, then the generated errors in the successive difference equa- 
tions involving the y,; are not, and the method of least squares is not 
applicable. The error involved in using Prony’s method for such a 
situation is identical with that involved in estimating a linear regression 
coefficient by use of the difference between the first and last observations. 

A second way of taking advantage of the linearity of the differential 
equation is provided by Hartley’s method of internal least squares 
[18]. Rewriting equation (3.4) as 


dy 
dt +a, dt + ay = 0 (5.8) 
Hartley’s method would proceed by integrating once to obtain 
d t t 
dyta f (5.9) 


and again to obtain 
(7) +a, [ y dt + a, [ ydtdt+C,7T+C,=0. (5.10) 
0 


The required integrals would be obtained by numerical integration 
from the observed y; and the method of least squares then used to obtain 
estimates of the required constants, a, , a, , C, and C, in this case. This 
method is asymptotically equivalent to least squares. We have been 
interested in the possibilities of extending Hartley’s procedure by use 
of an iterative calculating scheme. For this purpose one would start 
with initial estimates of the constants. From these one would compute 
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the required integrals by means of equation (3.5). If these integrals 
are denoted by 7, and J, , 2 second set of estimates of the constants is 
then given by minimizing 


w(y — al, — al, — (5.11) 


In the one example to which we have applied this method, the estimates 
obtained do converge to a least squares solution, but with such in- 
credible slowness as to cause us to lose all further interest. 

It is to be noted that the discussion has been in terms of the A’s 
and }’s, whereas our interest is actually in the underlying k’s. Fears 
are sometimes expressed as to the non-identity of the k’s derived from 
the least squares estimates cf the A’s and )’s via equations (3.5.1) and 
(3.6) and the 4’s obtained by differentiating the sum of squares with 
respect to each of the h’s. Those fears are unjustified. Thus consider 
a function of {and unknown constants u, , , u,, say f(t, Un). 
Suppose u,; is a function, not depending on 4, of n other constants 
V,--- V,,say¢; (V,,---,V,). Then if the observation corresponding 
to 1; is y; , the sum of squares is } 


wily: — f(ti (5.12) 


The n equations obtained by differentiating with respect to u; are 


Ou; 
Those obtained by differentiating with respect to V, are 


f(t; Un) ] = 0. (5.14) 


But, since ¢, does not depend on ¢, it is constant throughout the sum- 
mation on subscript i and may be taken outside the summation on 2. 
The n equations (5.14) are thus simply a linear compound of the a 
equations (5.13) and any set of u satisfying (5.13) will also satisfy (5.14). 
Furthermore, if the u; are single-valued functions of V, , --- , Vs 
and the V; are single-valued functions ef u, , --- , Us [as in equations 
(3.6) and (3.6.1) and their inverses], a solution for one yields the other 
uniquely. 
It is also to be noted that we have discussed least squares estimation 
for only one set of observations. But in the present situation two 
sets of observations, on urine and on blood, are available. ‘These two 
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sets are described by different equations, e.g. (3.5) and (3.7), but the 
constants of these equations are not independent of each other. The 
constants must thus be determined simultaneously from both sets of 
observations. This involves no difficulty in principle, since one merely 
minimizes a different sum of squares. It does involve a considerable 
complication in practice, however,,. 

A question of considerable interest relates to the number of com- 
partments required to specify a system. This is equivalent to asking 
how many exponential terms of the form of equations (3.5) are required 
to describe a given set of experimental results. No exact way of answer- 
ing this question is known to us. By analogy with methods appropriate 
to the fitting of polynomials, one is tempted to investigate the evidence 
for an additional term by testing for significance the reduction in the 
sum of squares occasioned by fitting the two additional constants implied 
by an additional term. There is an important difference between the 
two problems, however. A polynomial specified by n constants can be 
passed exactly through n arbitrary points, but a linear combination of 
n/2 exponentials specified by n real numbers cannot. Each constant 
in such a polynomial can, so to speak, effect a larger reduction in the 
sum of squares than can each constant in such a linear combination of 
exponentials. This is simply an intuitive way of saying that the usual 
theory, which covers the case in which the unknown constants appear 
linearly, does not provide an exact treatment for cases such as the 
present, in which some of the fitted constants appear non-linearly. It 
seems clear that the P-value obtained by applying the usual theory to 
the reduction in sum of squares effected by an additional exponential 
term provides an upper limit to the correct value, so that a component 
found significant by the usual theory can be considered significant. 
The converse is false, however, and the magnitude of the error involved 
in acting as if it were true is unkown to us. 

All this assumes the adequacy of the simple additive, least squares 
model. If the only sources of variation in the system were errors in 
determining the amount of label present, the model would follow 
immediately. Such errors could arise very easily in dealing with 
radioactive isotopes because of the normal Poisson error in the counting 
rate and in dealing with stable isotopes because of errors in measuring 
net peak heights in the mass spectrometer. There are other sources 
of variation which could be present, however, which could materially 
affect the adequacy of this formulation. We are most interested in the 
situation which could arise when measurements at successive times 
are made on the same organism and it is considered, not in the somewhat 
unrealistic deterministic fashion of Section 3, but as being in a stochastic 
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state. More exaetly we substitute for Assumption 3 the following less 
restrictive and perhaps more realistic 

Assumption 3’—The fractional amount of unlabelled substance going 
from compartment 7 to compartment j in unit time is a normally dis- 
tributed variable with mean /,;; and variance o?7, . 

We devote the remainder of this paper to exploring in a restricted 
and quite preliminary way some of the more elementary consequences 
of this viewpoint. We consider initially a one-compartment system 
and assume no measurement error. Consider the amounts of label in 
such a compartment at 2 equally spaced intervals of time A¢ and denote 
these amounts by y, (a = 0,1, --- ,). Then by Assumption 3’ 


(Ya Ya-1)/Ya-1 kK + Ea’ (5.15) 
where 


ke, = 0 forall a, =o. forall a. 


lor what follows it is also necessary to specify I (€,¢;). We shall make 
the assumption that for all a not equal to 8, this covariance is zero, 
even though it seems clear that for sufficiently small Af it would be 
physiologically difficult, if not impossible, for this to be the case and 
that the assumption of an autocorrelation function would be more 
satisfactory for small At. 

There are two immediate consequences of interest. The first is that 
the least squares estimate of A is obtained, not by fitting a least squares 
exponential curve to the observations, but rather by setting 


Thus, the usual, apparently naive, practice of chemists in estimating 
rate constants, the averaging of the values of the constant computed 
for each of a series of time intervals [19, 20] is seen to follow from the 
assumption that the reaction is proceeding stochastically, and that 
there are no errors of measurement. The second consequence of interest 
is that this model leads to constant coefticients of variation for the 
observed y, rather than constant variances. Thus, using the usual 
notation for conditional expectations and variances, it is easy to see 
from (5.15) that E(y./ya-1) = (1+ K) and V(y./ya1) = 
= + K)? (5.17) 

for all @. 
This way of looking at the estimation of the rate constant for a one 
compartment stochastic system suggests an immediate generalization 
for multi-compartment systems. The estimate (5.16) is equivalent to 
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that obtained by using a simple autoregressive scheme 


+ + Me (5.18) 


where the u, are independent variables with zero mean and variances 
equal to o7y2_, . (It is of interest to contrast the result with that 
obtained for the same situation, but assuming equal variances [21].) 
The multi-compartment scheme can thus be treated as a general anto- 
regressive scheme 


= + KiYa-2 + ’ (5.19) 


where the V, are independent variables with zero means and variances 
proportional to easily computed functions of the y, . Except for the 
unequal variances, this leads to an estimation procedure which is the 
same as Prony’s. 

The superimposition of measurement error on stochastic variation 
introduces serious complications. Denote the observation by z, , 
where 24 = Ya + 6, and 6, is normally distributed with /(6,) = 0, 
= 0, a B, E(6,us) = O for all a and B, = «2. 

Analogous to (5.18) we now have 


(1+ — +A) + 6, (5.20) 


hut-since the additive error now involves both 6, and 6, | , successive 
z, are not independent, and the simple estimation procedure (5.16) is 
hot applicable. It would in principle be possible to write the joint 
distribution of the z, and from this derive maximum likelihood estimates 
of the ya, A, o2 and o? , but this is not a very attractive calculation. 
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GENETIC CORRELATIONS WITH MULTIPLE ALLELES 


R. G. STANTON 


University of Waterloo, Waterloo, Ontario, Canada 


INTRODUCTION 


Hogben [1932] gave the values for filial and fraternal correlations 
for matings involving two alleles in a panmictic population. The sub- 
ject has been considered in great detail by Li [1955], but the known 
results are restricted to the case of two alleles. Consequently, if it is 
desired to compare a correlation derived from actual experimental data 
with a theoretical correlation computed on the basis of some genetic 
hypothesis, one can do so only if two alleles are involved. The present 
study will show how genetic correlations may be computed when more 
than two alleles are involved. While most results hold for n alleles, 
it will be convenient to employ n = 3 in some of the demonstrations. 
At a later time, it will be shown how to extend the results of Penrose 
[1933] and Stanton [1946] to the case of three alleles. 


GENETIC WEIGHTING 


Suppose that we consider the case of autosomal genes when n alleles 
A,, +++, A,, are involved at a locus. Then, if n = 2, it is well-known 
that one may assign weights 0, 1, 2, to the genotypes A,A, , A,A, , 
A,A, , respectively, and thence compute correlations between relatives; 
this arbitrary assignment of weights is a natural formulation of the 
fact that the heterozygote A,A, is intermediate between the two 


homozygotes. 


When we attempt to extend this procedure to n alleles, we encounter 
difficulty; suppose we consider n = 3. Then a superficial approach 
would immediately lead us to arrange the genotypes as A,A, , A,A;, 
A,A, , , , , and assign weights 0, 1, 2, 3, 4,5. Un- 
fortunately, this assigns a weight 5 to A;A, , and it is thus not inter- 
mediate between the two homozygotes of weights 0 and 4. Actually, 
this failure is not too surprising if we realize that the weights 0, 1, 2, --- , 
are arbitrarily assigned mathematical weights and do not represent 
genetic effects per se. We need some sort of weighting scheme whereby 
the homozygotes are located symmetrically with respect to one another; 
clearly, a system of real weights 0, 1, 2, --- 
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out linearly along the weight-axis, and will vot possess the required 
symmetry. 

These considerations suggest that we might employ hypercomplex 
numbers. Actually, our procedure has a vague analogy with electricity, 
where real current effects are described in terms of complex numbers. 
Here we suppose that the » homozygotes are situated at the vertices 
of a regular simplex with x vertices in (n — 1)-dimensional space; it 
is then natural to place the heterozygotes at the mid-points of the 
edges. For symmetry, we suppose that the origin of coordinates is at 
the centroid of the simplex. Then we assign to the homozygote A ;A; 
the vectorial weight v; where v; represents the vector joining the origin 
to the point representing 1,4; . To the heterozygote A,A; , we assign 
the vectorial weight given by the vector joining the origin to the point 
representing 4,4, , that is, the weight 3(v7; + v;).. We need to make 
the space of vectors into an algebra since, in computing correlations, 
the product of two vectors will occur. The only workable definition 
of such a product is the ordinary scalar product; with this convention, 
we have 


= v,-v, = constant. = 1,--+ ,m) (1) 


For convenience, this constant will be taken as 1 (that is, the vectors 
joining the origin to the n vertices of the simplex are all taken as having 
unit length). Then we may formulate the fact that the edges of the 
simplex are all of equal length as 

vw; = constant. (i # j) (2) 
Also, since the centroid of the simplex is at the origin, we obtain 


+o, =0. (3) 
Multiplication of (3) by v, gives 


+ (n — 1)vv; = 0; 
hence, using (1), we evaluate the constant in (2) and obtain 
vv, = —1/(n — 1). (t 9) (4) 


This weighting scheme is illustrated in Figure 1 for n = 3 (in this 
case, the simplex is just an equilateral triangle and the weights are 
ordinary complex numbers; obviously this scheme, whereby the 3 homo- 
zygotes are situated at the vertices of an equilateral triangle, is the only 
possible symmetrical geometric arrangement of 3 points). One might 
also give a physical realization of the case » = 4 (a tetrahedron, the 
weights being ordinary Cartesian vectors ai + bj + ck). Using equa- 


{ 
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AA, 


AA 


A.A. 


FIGURE 1 
WEIGHTING SCHEME FOR THREE ALLELES 


tions (1) and (4), we find the following numerical results, which we shall 
neec in the following sections, for the case n = 3. 


2 


[2@; + 3, 30; + 9,30; = 


(5) 


Throughout the relations (5), 7, j, and k range from 1 to 3 but are distinet 
from one another in any single relation. This convention will be 
employed henceforth. 


AN ALTERNATIVE APPROACH TO THE WEIGHTING PROBLEM 


In this section, we shall indicate another approach to the weighting 
problem discussed in the previous section. Again, consider a panmictic 
population containing 7 alleles A, , --- , A, , at a given locus. Then 
the genotypes may be characterized by the vector variable (X, , X, , 

-- , X,) where X, is 0, 1, or 2, and denotes the number of A; genes 
in the genotype. lor n = 3, these vectors are (2, 0,0), (1, 1, 0), (1, 0, 1), 
(0, 1, 1), (0, 2, 0), (0, 0, 2), and they all lie on the plane X, + X, + X3 = 
2. This situation is shown in Figure 2.. Clearly the only distinction 
between Figure 1 and Figure 2 is that the origin in Figure 2 has been 
projected into the centroid of the triangle and a change of scale has 
been introduced in order to normalize distances and make the distance 
from the centroid to each vertex equal to unity. 

Another possibility is to project not onto the plane X, + X, + 
X, = 2 but onto the plane Y, = 0; the resulting vectors are shown in 
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FIGURE 2 
ALTERNATIVE WEIGHTING SCHEME FOR THREE ALLELES 


Figure 3. While this scheme is not as symmetrical as that of Figure 1, 
it is exactly analogous to the familiar case of 2 alleles where the geno- 
types (2, 0), (1, 1), and (0, 2) all lie on the line X, + X, = 0; projection 
onto the axis X, = 0 produces weights 0, 1, 2, whereas the symmetrical 
weights —1, 0, 1 are obtained by taking the origin at (1,1) and normal- 
izing the distances. Computation is straightforward under either 
system. 


Xe 


(1,0) 


FIGURE 3 
ProsEcTED WEIGHTING SCHEME FOR THREE ALLELES 


In concluding this section, it might be noted that the present paper 
merely aims at developing the theoretical correlation coefficients; the 
distinct problem of determining whether the sampling distribution of 
the sample correlation coefficient is independent of gene frequencies is 
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not discussed. If this independence were established, then a statistical 
test could be based upon the sample correlation coefficient. 

I should like to thank Dr. D. 8. Robson for suggesting the inclusion 
of this section. 


PARENT-CHILD AND SIB-SIB CORRELATIONS 


In this section, we take n = 3 for ease in illustration; the results 
are general, although the proof for n arbitrary is simpler by a stochastic 
method. We let the frequency of gene A; be represented by a; , with 
the usual convention that the total frequency be unity, that is, 


a,+a,+ a; = 1. (6) 


Then the parent-child array for random mating in the case of autosomal 
genes is given in Table I. In this array, the vertical coordinate refers 


TABLE I 
PARENT-CHILD ARRAY 


Child 
Parent A\A2 A;3A3 A3A, 


A3A, a,a3?_ | a,a;(1 — az) 


to the parent and will be denoted by z; the horizontal coordinate refers 
to the child and will be denoted by u. We shall use the symbol >, to 
denote summation in the case where a straight summation is involved 


and also in the case when each variate is counted with its proper fre- 
quency; thus 


D ay, = ay, + ayv, + ayvs , 
but 


> + 2a,a22,2 + + + + 


Furthermore, if two subscripts ¢ and j occur in a summation, the sum 
ranges over all values 7 ¥ j with 7 < j; thus 


Dd a,a; = aya, + + 


lt 
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With these agreements regarding notation, we immediately find 
Dz = Du= Lay, , 
= Dw’ = Daa,, 
It is natural to define the generalized correlation coefficient’ as 
Sz du 


Substituting the numerical results of (7) in this formula, we obtain the 
parent-child correlation 


r=}, (8) 


The result of (8) is perhaps not too surprising; certainly it is what one 
would hope for. When the sib-sib array is formed and the correlation 
between sibs computed, it likewise is found to take the value 3 (it 
should be noted that the correlation between parents in the parent- 
parent array is, of course, equal to zero). These three results indicate 
that the method of assigning genotype weights described in the first 
section is the correct one. 


We conclude this section by computing the parent-child correlation 


TABLE II 
PARENT-CHILD ARRAY FOR BLoop TyPEs 


176 
67 
6 
159 

227 | 892 
303 690 


901 705 1990 


!There is a of opinion as to whether this generalized correlation coefficient might not 
better be given another name. The ordinary product-moment correlation is certainly appropriate in 
the case of quantitative or metrical characters. In the case of qualitative characters, where there 
»xists no scale and no well-defined order relation, as in the present discussion, the term “coefficient 
of assuciation” has been suggested; on the other hand, once a scale has been introduced, the computa- 
tion involved does not differ from that involved in the case of metrical characters, Consequently, 
in the case of qualitative characters, whether one prefers to speak of a coefficient of correlation or of 
4 coethicient of association is basically a question of how the two terms aficct the individual semantically. 


(7) 
Child | 
Parent AA AB BB BO 00 Ao | 
| AA 49 5 -- 
i AB 10 10 2 21 
BB | 2 4 
; BO ; — 20 4 50 56 
AO | 122 14 28 223 
| 181 51 6 146 
| 
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for & numerical table; the data of Table IL are modified from Boorman 
(1950|. The three blood genes A, B, 0 take the roles of A, , Az, As 
respectively. Using u to refer to the horizontal (child) coordinate and z 
io refer to the vertical (parent) coordinate, we find 


doz = 554.5v, + + 1316.50; , 
= 559, + 104.5v. + 1326.52; , 
> 2 = 1308, = 1313.5, = 949.75. 


The value of r computed from these data is .518. 


SEX-LINKED GENES 


In the case of a sex-linked character, the male parent can have only 3 
possible genotypes, that is, A, , A, , or 4; . We immediately find that, 
for females, >>z and }°z* are the same as before, that is, the female 
variance is given by 


o (Females) = 3 a,a; (9) 


For males, we find that 


Dae: 


and the male variance is given by 
a” (Males) = 3 >> a,a, . (10) 


Formulae (9) and (10) are used in computing all the parent-child and 
sib-sib correlations of this section; as an illustration, we work out the 
correlation in the brother-sister case, which is one of the more compli- 
vated ones. The necessary array follows. (In this array, and in the 
remainder of the paper, we shall use A, B, and C rather than A, , Az, 
and A; ; the gene frequencies will then appear as a, b, and c, and the 
formulae will be shorter and simpler in appearance). 


2 = |, 


~ 


TABLE III 
ARRAY FOR SEX-LINKED GENES 


AB BB BC ce 


A (a* +a) jab(l+ 2a) jab? abe suc? pac(l + 2a) 
B + 2b) + 2b) dbe? abe 
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For this array, we find that S“zw = Ca? — } }Ca,a; ; combining this 
result with formulae (9) and (10), we obtain a brother-sister correlation 
of 1/2 V2. 

The correlation obtained in the last paragraph between brother and 
sister turned out to be the same as for the case of 2 alleles; this resuli 
also holds for all the other parent-child and Sib-sib correlations, that is, 
these correlations are 0 for father-son, 1/+/2 for father-daughter and 
mother-son, } for mother-daughter and brother-brother, ? for sister- 
sister. These results can also be obtained by using the J, T, and O 
matrices of Li and Sacks [1954], since it is readily verified that the 
method of weighting which we have introduced in the first section gives 
the same means and variances for the J, 7’, and O populations. 


DOMINANCE 
We shall refer to the three possible varieties of dominance as follows: 


TypeI — A dominates B and C, B dominates C; 
Type II — A dominates both B and C, neither B nor C dominates; 
Type III — A and B dominate C, neither A nor B dominates the other. 


In the case of dominance with 3 alleles, unlike the case of 2 alleles, we 
can not employ stochastic matrices; here we shall record the results in 
Table IV. 


TABLE IV 
CORRELATIONS WITH TyPE I DoMINANCE 

Quantity Value Notation 
Male Variance 3(ab + be + ca) 3M, 
Female Variance 3[(a? + 2ab + 2ac)(b + c)? + c%{b? + 2bc)] 3Q: 
Parent-child 1 — [a(b + c)? + bc?]/Q, = 1 — Pi/Qi 
Child-child rpc + (a2b? + ac? + b%c?)/4Q, foc 
Father-son 0 Trs 
Father-daughter (Pi — Trp 
Mother-son Tr Tus 
Mother-daughter Tpc 
Brother-brother 4 
Brother-sister arpp TBs 
Sister-sister 1 — P,/2Q: Tss 


In the succeeding tables, we shall omit the correlations for father-son, 
mother-son, mother-daughter, brother-brother, brother-sister, and 
sister-sister, since they can always be expressed (as above) in terms of 
the parent-child or father-daughter correlations. 


be 
4 
= 
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TABLE V 
CoRRELATIONS WITH TyPE II DoMINANCE 
Quantity Value 
Male Variance 3M, 
Female Variance 3Q2 = 3[(a + ab + be + ca)(b + c)? — hdc] 
Parent-child 1 — P2/Q:, where P; = a(b +c)? + 3bc(b +c) — 4be 
Child-child rpc + [fabe + a*(b? + be + c?)]/4Q2 
Father-daughter [P2 + tbe(b + 
TABLE VI 
CORRELATIONS WITH TyPE III DomMINANCE 
Quantity Value 
Female Variance 3Q; = 3[4ab + 2abe + abc? + c? — c4] 
Parent-child 1 — P;/Q; , where P; = 3(ab + 5abc) + c? — c3 
Child-child + + c%(a? + ab + b?)]/4Q; 
Father-daughter [Ps + tab(1 — 


We conclude this section by computing the theoretical and actual 
correlations for the original blood-group data of Table VII as given by 
Boorman [1950]; we use the values a = .27177, b = .05543, c = .67280 
given by Li [1955] for the gene frequencies. 


TABLE VII 
BoorMan’s Data ON PARENT-CHILD BLoop Groups 
Child 

Parent A AB B O 
A 596 19 28 223 866 
AB 34 10 23 0 67 
B 29 22 58 56 165 
O 227 0 43 622 892 
886 51 152 901 1990 


The theoretical value for the correlation in this table is 
ab + 5abe + 4c*(1 — c) 


= .417. 


fpc=1-— 


4(3ab + 2abe + + — c*) 
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To compute the correlation from the table, we find 

Doz = + 177.5, + , 

u = 899.50, + 198.5r. + 892r, , 

= 1951.75, (>> ='531159.25, 

= 1939.75, = 486199.75, 

> = 1000.00, = 508180.00, = 441. 
The agreement seems good. 


SUMMARY 


By placing the pure genotypes symmetrically at the vertices of a 
regular simplex and using vectorial weights, it has been possible to 
compute genetic correlations in the case of n alleles. ‘The actual values 
are given for all possible correlations when only 3 alleles are involved 
and when any type of dominance may be present. 
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ANALYSIS OF EXPERIMENTS MEASURING 
THRESHOLD TASTE* 


K. Harris 


Robert A. Taft Sanitary Engineering Center 
Public Health Service, Cincinnati, Ohio, U.S.A. 


Introduction 


In the taste tests considered here, members of a panel are asked to 
make a series of determinations, each of which involves a fixed number 
of liquid preparations—two in some experiments, three in others. Each 
group of preparations includes one or two beakers containing a given 
concentration of the test material while the remaining beakers contain 
only the solvent. The subject familiarizes himself with the taste of the 
solvent before engaging in the test proper. Then, if three solutions are 
included in each test group, he is asked to select the odd-tasting one 
and state whether it is a blank or a concentrate. In this case, the 
probability, p, of correct assignment by chance is 1/6, whereas when 
only two beakers per group are examined, p = }. 

The groups are presented in geometrically increasing concentra- 
tions of the test substance. This practice, plus frequent rinsing of the 
mouth with the solvent, helps to minimize cumulative effects which 
would bias the individual’s apparent threshold downward (i.e., make 
him appear more sensitive than he really is) or adaptive effects which 
might exert an upward bias. Admittedly, trying to make each suc- 
cessive discrimination independent of the preceding one, is a difficult 
task which has received considerable attention [c.f. U. S. Dept. of 
Agric., 1951, pp. 29-30]. 

The purpose of these tests is to learn something about the prob- 
ability distribution of individual threshold concentrations, as represented 
by the panel members. Whether the information gained may be 
translated with confidence to a large population of individuals depends 
first on prior definition of this population and then on selection of the 
panel as a probability sample or, at least, as a reasonably unbiased 
representation of the distribution of thresholds in the population. 
Alternatively, we might conceive of a population generated by repeated 
experiments using the same panel members and test substance. 


*Presented at a joint meeting of the Biometric Society (ENAR), Institute of Mathematical 
Statistics, and American Statistical Association (SPES), March 19, 1959, Pittsburgh, Pennsylvania. 
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In the following section, two methods are derived for estimating 
threshold distributions. ‘These procedures are then applied to specific 
examples and their relative merits noted. Finally, confidence limits 
appropriate to each method are discussed. 


Methods 


In both methods of analysis an individual’s true threshold is defined 
to be a concentration, X, , such that for all X > N, the individual is 
certain to discriminate correctly between concentrate and blank, but 
for X < No correct identifications will occur only by chance. As will 
be seen, we actually allow XY, a range between two adjacent concentra- 
tions since the threshold distribution function is evaluated only at those 
concentrations used in the experiment. Further discussion of this 
definition is deferred to the end of this section. Successive discrimi- 
nations are assumed independent which implies that the individual’s 
threshold remains constant during the course of the test. 

1. Direct estimation of threshold distribution—a nonparametric 
procedure. 

Since groups of beakers are presented in geometrically increasing 
concentrations of the test substance, the 7-th concentration (in parts 
per million, say) may be written X, = b', where b is a constant ratio 
andi =h,h + 1,---,k. For example, b = 4,h = —2,k = +3 would 
describe the concentration series, X; = .062, .25, 1.0, --- , 64 ppm. 

Assuming for the moment correct identification at least within the 
group containing the highest concentration, there will be some value 
of 7, say 7’, such that at all 7 > 2 correct decisions are made, while at 
i = 7’ — 1| the taster’s judgment is wrong. Ilis apparent threshold, 
say X’, is then assigned the value, log, X’ = 7’ — 3. When the groups 
are all identified correctly, or all incorrectly, X’ is indeterminate. 

It is clear from the foregoing rules that the subject’s apparent 
threshold will tend to be less than his true threshold, because chance 
alone may permit correct identification at one or more successively 
decreasing concentrations below his true threshold point. Since the 
probabilities for such occurrences are known (i.e., the value of p is 
determined by the design of the test), the relative frequencies of apparent 
thresholds over a panel may be adjusted to produce the estimated 
cumulative density function (e.d.f.) of true thresholds, evaluated at 
the pointsh,h + 1, --- ,k. 

The adjustment formulae are simple. To derive them, we first 
introduce the variable j’ = 7’ — h — 1. Then, fork + 1< 7 < k, 
i 0,1,2,--+,(k — 1). Vor indeterminately low or high values 
ot X47 <Oorj’ > — h 1), respectively. Let be the observed 
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relative frequency of j’ over the panel. Tinally, let P(—©@, h), 
P(h,h +1) ---,P(k — 1,k), P(k, + ©) denote estimated areas under 
the true log, threshold probability distribution, evaluated between 
successive pairs of points. 

The relationships between the observed variables j’ and the P’s are 
indicated in Table 1. Tach cell in this table shows the conditional 
probability of 7’ given that the true log threshold, say 7 , lies within a 
specified range. 


TABLE 1 
PROBABILITY OF j’ FOR GIVEN RANGE OF TRUE THRESHOLD 

J’ [or] 
Range of = log, 
of true threshold <0 0 1 | 2 |-+-|(kK-—h—-1)| >(k-h-1) 
Cin Sh 1 0 0 0 0 
h<inSh+1 q 0|0 0 0 
h+1<i0oSh+2|_ 7p? Pq q | 0 0 0 
h+2<inch+3] Pa} 0 0 
k-1<%<k | |---| eee q 0 


For example, consider the probability of 7’ = 0 in the case of a 
taster whose true log threshold 7 lies in the range h + 2 <7 < h+3. 
The value j’ = O derives from an incorrect identification in the first 
group, containing the lowest concentration b* but correct discrimination 
in all succeeding groups. Since his true log threshold lies between h + 2 
and h + 3, the subject’s success in detecting concentrations b**’ and 
b**? would be only a chance event with probability p*. His overall 
probability of being wrong on b* and right on the next two higher 
concentrations is therefore p’g, where q = 1 — p, as given in Table 1. 
The sum of each row in the table is unity. 

From Table 1 we may immediately write the equations relating 
I’(j') to the P’s. These are 


exp. Fj’ < 0) = 1-P(—@,h) + pP(h,h +1) 4+-:: 

+ "'P(k, ©), 
0-P(—,h) + 

+ "P(k, ©), 


exp. F(j’ >k —h — 1) = 0-P(—o,h) +1) 


ll 


exp. F(j’ = 0) 


+ gP(k, ©), 
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which have as solutions 
P(—@,h) = [qF(j’ < 0) — 
Pih,h + 1) = [F(O) — pF(D)]/q, 
+ 1,h + 2) = [F() pF(2)]’q, (1) 


Pik — 1,k) = [Fk —h — 1) pF’ > k —h — 
P(k, ~) = F(j’ > k — h — 1)/@. 


The solutions sum to unity, of course, but there is no assurance 
that the right-hand sides of the above equations will always be positive 
even if the observed F(j’) are smoothed so that F(j’) # 0 for any 7’. 
If the F(j’) follow a unimodal form, the solutions for P(— ©, h) and 
possibly, P(h, h + 1) may turn out negative; if the form is bimodal, a 
negative solution may be obtained in the range between the modes. 
Such a solution will usually be close to zero, and the difficulty can be 
overcome by summing the negative result and an adjacent, numerically 
larger positive result to produce a positive solution over the combined 
range. 

The estimated c.d.f. of the true log threshold is given by 


1, = est. diy = FU <0) + TPG) O/H), 


where jp = 0,1, 2, --- ,k —|h. 

This procedure may be contrasted with a common formula which 
adjusts for chance the proportion of correct responses observed at each 
concentration. In the present notation, this formula reads, 


= (Po = (3) 


where P, denotes the observed proportion. 

This equation in one form or another is at least 25 years old, 
mentioned by Guilford [1936] as a customary correction. Recently it 
has been used by Harrison and Elder [1950], Gridgeman [1955], who 
also gives a derivation, Berg et al. [1955], Filipello [1956], and doubt- 
less others. It is clearly biased at the upper extreme since, when Py = 1 
1, must also equal one despite the fact that some of the correct identifi- 
cations may be guesses. This bias appears to extend to values of P» 
less than one; at least, in our experience, Equation (3) has almost 
always yielded estimates greater than or (occasionally) equal to those 
given by Equation (2). Naturally, such differences diminish as p gets 
smaller. 
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The writer is not aware of any other proposed formulas and suggests 
that estimates of the cumulative distribution of individual thresholds 
should be derived only from an explicit definition of threshold. Such 
estimates are conditional upon the definition which is, of course, always 
open to argument and improvement. 

2. Estimating the true threshold distribution from a mathematical 
model. 

An entirely different approach to estimating the true threshold 
distribution arises from the following argument. In accordance with 
the definition of threshold stated at the beginning of this section, e? * 
panel member may be characterized by a certain number of tes’ - 
centrations, say n, , for which correct discrimination between - 
centrate and blank is a matter of chance, with probability p. she 
probability of m correct identifications out of these n (omitting the 
subscript s) is then P(m) = (2)p"q""". Of course, m is not dircetly 
observable; rather, is a random variable whose estimated ¢.d_f. 
represents essentially the true threshold c.d.f. which is desired. Let 
N (= k — h + 1) denote the total number of concentrations empl yed 
and é the total number of correctly identified groups, known for ~ach 
subject. Then, again under the definition of threshold, we mav im- 
mediately write 


t=m+N-—n. (4) 


Equation (4) represents the basic mathematical model from which 
the true threshold c.d.f. will be estimated. However, it is more con- 
venient to consider the known variable v = N — t = n — m. Since 
the joint distribution of m and n is (,*) p"g" "f(n), the unconditional dis- 
tribution of v may be written 


= 2 @ =0,1,---,N). 


The simplest form of f() is another binomial, say 


This assumes that the thresholds of panel members form a homogeneous 
sample which might just as well represent, say, the day-to-day variation 
of a single individual. Substituting this binomial expression for f(n) 
into Equation (5), we obtain a binomial-binomial compound distri- 
bution, easily shown to be another binomial: 


ae) = oni 6) 
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Experience at the Sanitary Ingineering Center has indicated that 
in some tests a binomial form of f(n) does not adequately explain the 
variation in individual thresholds. Seeking to improve the fit by 
introducing variability in 6, we have in fact been led to an interesting 
application of the binomial-gamma model used to describe variation 
in the survival of bacteria in sea water (Harris, [1958]). Under this 
model, we assume that each subject possesses his own value of 6, say 
6, , that 6, may be defined as e** and that 6, follows a gamma distri- 
bution, 


g(,) = al 


Another approach assumes that 6, follows a beta distribution, e.g., 
Chiang [1951]. The writer prefers the binomial-gamma model for two 
reasons: (1) the parameters of the binomial-beta compound distribution 
are more difficult to estimate, requiring expert statistical competence 
plus considerable labor; (2) more important, the parameter 5, may be 
precisely defined and discussed in terms of an individual’s reaction to 
the sensory stimuli provided by the taste test. 

Imagine that each subject requires some minimum number of taste- 
stimulating molecules of a given substance in order to distinguish a 
bottle containing the dissolved material from one containing the solvent 
alone. This minimum number comprises a packet and 6, represents the 
average number of such packets per volume of concentrate tasted by 
subject s (i.e., averaged over the N concentrations in the test). As 
noted in the following paragraphs, we distinguish between less than 
and at least one packet per volume. Further than this, however, no 
numerical definition of the number of molecules of material needed 
to stimulate taste is required. 

Consider, for example, a panel member who is quite insensitive to 
the test material. The number of molecules he requires to make up a 
packet may be so large that even the highest concentration contains less 
than this number in the volume tested. For this individual, every 
concentration in the test contains effectively zero packets, leading to 
an average value, 6, , of zero packets per concentration. Hence, 
6, = l and n, = N, implying that all of his judgments are chance 
events. 

Now consider a moderately sensitive subject for whom the two 
highest concentrations each contain at least one packet of molecules 
but lower concentrations contain less than one packet—effectively, 
zero packets. Ilis first NM — 2 judgments are in fact chance events. 
In this case, 6, is greater than 0 although its exact value remains un- 
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known. Finally, a highly sensitive individual will find a very large 
number of packets, especially in the higher concentrations. He will 
be characterized by a large value of 6, , leading to @, close to zero and 
n, probably equal to zero. 

The proposed model assumes that these individual values of 6, 
follow a distribution of the gamma form with parameters a and 8. 
The writer admits having no empirical evidence to support a gamma 
model. However, this distribution type, with a range from 0 to ©, 
as required by the definition of 6, , is quite flexible, assuming an ex- 
ponential form when a = 0 and moving through stages of decreasing 
skewness towards normality as a increases. Moreover, as shown in the 
earlier paper (op. cit., [1958]), compounding a gamma model with the 
binomial (5) leads to a highly versatile form for the distribution of v, 
the number of incorrect identifications, namely, 


(k here is simply an index of summation, not to be confused with its 
use as the log of the highesu concentration). 

This discrete binomial-gamma model assumes a wide variety of 
shapes for varying a and £# including both unimodal and bimodal 
forms. It should therefore represent reasonably well even a highly 
dispersed population of thresholds. 


Virst and second factorial moments of g(v) are given by the ex- 
pressions, 


a+l 

E@ = va 

If S, = (1/Nq) mean v and S, = [1/N(N — 1)q’] mean o(v — 1), 
then AK = log S./log S, is an estimate of log (8/2 + 8)/log (8/1 + 8). 

An extensive table of 8 as a function of A has been computed and 
is available on request. 

Finally, @ + 1 = log S,/log [8/(1 + 8)]. 

These values 8 and @ + 1 are then inserted into the binomial-gamma 
form of f(n), namely, 


to estimate the e.d.f. of n, and thus of the true threshold at the points 
h,h + 1, +--+, kh. When the binomial (6) is adequate to describe the 
variation in v, we estimate # = 6q followed by 6 = #/q, which is then 
used in the simple binomial form of f(n). to estimate the desired c.d.f. 
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The statistic K normally varies between 1 and 2, sometimes slightly 
exceeding 2 through sampling variation. Our experience has been that 
for K < 1.8, use of the binomial-gamma in place of the simple binomial 
model substantially improves the fit. 

Some investigators may object to the definition of threshold under- 
lying these methods of analysis. A more realistic definition might 
locate a person’s threshold at a concentration where the probability p 
of correct identification begins to rise above the chance level. At this 
point, p becomes an increasing function of concentration, approaching 
unity asymptotically. As indicated earlier, the present methodology 
will accommodate this definition if practically the entire function lies 
between two adjacent test concentrations. 

However appealing such a definition might be, it would not satisfy 
the needs of an administrative official seeking the maximum concen- 
tration tolerable without undue complaint by a large propulation. 
Inevitably, some value of p would have to be selected arbitrarily for 
control purposes. Apart from this nonstatistical problem, statistical 
difficulties also arise. The function p = {(X) must include parameters 
of shape in addition to the threshold concentration, all of which may 
vary from one individual to another. Problems of modelling and 
estimation would be considerable, while a nonparametric method would 
be out of the question. 


Examples 


The Sanitary Engineering Center of the Public Health Service is 
frequently called upon to conduct threshold taste and odor tests in 
order that responsible official agencies may have some basis for setting 
limits to the permissible concentrations of organic and inorganic 
materials in public water supplies. For reasons of economy and con- 
venience, panels have been composed largely of chemists and chemical 
technicians working at the Sanitary Engineering Center. Most likely, 
such panels do not represent an unbiased sample of thresholds in the 
general population; in fact, they may well provide an insurance factor 
by being more sensitive than the public to taste and odor-producing 
compounds in water. From the sampling standpoint, however, the 
only population which can be referred to is that produced by repeated 
use of the given panel on the same test material. 

Even if the panel were an exact replica of a municipal population, 
the problem would still remain that the material being tested is a pure 
compound in a more or less artificial solvent. In any case, therefore, 
the results are not immediately applicable to a public water supply 
containing this compound and many others, possibly interacting, in 
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water of changeable temperature, turbidity and so on. Put another 
way, laboratory tests such as these serve at best to determine the 
sensory effects of a particular constituent rather than to measure the 
reaction of the general consumer to the product as a whole. 

The following examples come from a recent series of fifteen tests. 
In the first example a panel of seventeen members tasted solutions of 
zine chloride in filtered spring water (containing minerals but no organic 
matter) at concentrations of 2, 4, 8, 16, 32 and 64 ppm (b = 2,h = 1, 
k = 6) in groups of three bottles so that p = 1/6. The second test 
involved an 18-member panel tasting ferrous sulfate in distilled water 
at concentrations of .016, .062, .25, 1, 4, 16 and 64 ppm (b = 4,h = —3, 
k = 3) in groups of paired concentrate and blank so that p = }. 


TABLE 2 
DIsTRIBUTIONS OF APPARENT THRESHOLDS, X’, AND j’ = i’ —h — 1 
IN Taste Tests oF Zinc CHLORIDE AND FERROUS SULFATE 


Zine chloride Ferrous sulfate 

(b= 2,h =1,k = 6, p = 1/6) (b=4,h = —3,k = 3, p = 3) 
X’ (in ppm) Frequency Frequency 

<1.5 <0 0 <-2.5 <0 3 

1.5 0 it —2.5 0 2 

2.5 1 2 -—1.5 1 2 

3.5 2 2 -—0.5 2 2 

4.5 3 3 0.5 3 3 

5.5 4 5 1.5 4 3 

>5.5 >4 4 2.5 5 2 

— >2.5 >5 1 

17 
18 


Table 2 lists the distributions of observed values of 7’ = log,X’ + 3 
and of j’ = 7’ — h — 1, where X’ has been defined to be the apparent 
threshold in parts per million. Substituting the relative frequencies 
of j’ into the right-hand side of Equation (2), we obtain estimated 
values of the c.d.f. of true threshold concentrations. These are given 
in Table 3. 

Note that in the zine chloride test the estimated c.d.f. value at 
Z) = 1 is negative and worthless, a consequence of the zero frequency 
at j’ < 0 (see Table 2). This situation did not arise in the ferrous 
sulfate test. 
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TABLE 3 
CUMULATIVE DisrRiBpUTIONS OF TRUE THRESHOLD CONCENTRATIONS 
BY NONPARAMETRIC METHOD 


Zine chloride 


Ferrous sulfate 


Xo = 2** ppm 


Estimated 
c.d.f. 


Xo = 


Estimated 
c.d.f. 


— .012 
.035 
.153 
. 259 
-412 


.016 

.062 

.25 
1.0 
4. 


.718 16. 
64. 


TABLE 4 
OBSERVED AND EXPECTED DISTRIBUTIONS OF V 


Zine chloride Ferrous sulfate 


Observed Observed 


Expected 


3.59 
.09 
3.51 


© 
— Cm © 


_ 
J 


_ 


98 
= = -5185, 


= .6349, 


= (.018)(2) = .74. 


_ Log 
Log S; 


443, 


= 1.4459, 


&@+1 = .385. 
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Turning to the second, parametric method, Table 4 gives observed 
and expected frequencies of v, the number of groups incorrectly identified 
by each panel member. Listed below the distributions are the relevant 
sample statistics and estimates. A simple binomial satisfactorily 
describes the zine chloride results. In the ferrous sulfate test, v showed 
considerably greater variation, reasonably well fitted, however, by the 
Linomial-gamma model. Substitution of the calculated values of 6, 
3 and & + 1 into their respective models yields the estimated cumulative 
probabilities shown in Table 5. 


TABLE 5 


CuMULATIVE DIsTRIBUTIONS OF TRUE THRESHOLD CONCENTRATIONS 
EstiMATED UNDER PROBABILITY MopEL 


Zine chloride 
(simple binomial) 


Ferrous sulfate 
(binomial-gamma) 


Xo(ppm) | Estimated c.d.f. Xo Estimated c.d.f. 


Before discussing briefly the relative merits of the two methods of 
analysis, the goodness-of-fit test applied to a probability model may be 
noted. The criterion used has been the likelihood-ratio index, —2 In 4, 
(e.g., Mood, 1950, p. 271), whose distribution closely approximates 
x with N—2 degrees of freedom in this case. In the zinc chloride 
experiment, —2 In \ equalled only 0.62 with five d.f.; in the ferrous 
sulfate test, —2 In \ equalled 3.36 with five df. Of the fifteen tests 
from which these two examples have been selected, three were analyzed 
according to the binomial-gamma form of f(n); in the other twelve 
cases, a simple binomial sufficed. 

Our experience has been that the cumulative threshold distribution 
estimated by the method using a probability model forms a smooth 
curve approximately linear on lognormal probability paper. The non- 
parametric estimates generally zig-zag about this curve, sometimes 
deviating substantially, as at the two highest ferrous sulfate concen- 
trations (Tables 3 and 5). The nonparametric method is simpler but 
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suffers the disadvantage of occasional missing points due to very small 
observed frequencies resulting in negative estimates which must be 
discarded. As shown in the following section, the greatest value of this 
method may lie in obtaining confidence limits when a simple binomial 
model is not adequate. 


Variances and confidence limits 


Equation (2), the estimating formula under the nonparametric 
method, is seen to be a linear function of the multinomial proportions, 
F(j’). Therefore, an approximate variance of the estimated c.d.f. at 
the point h + jo (jo = 0, 1, 2, --- , k — A) is given by a corresponding 
linear function of multinomial variances and covariances (e.g., Mood, 
op. cit., p. 214) in which the observed /'(j’) are substituted for their 
expected values. Similarly, an approximate variance may be obtaired 
for any of the P-solutions in Equation (1). 

Symmetrical confidence limits based on these variances are not 
trustworthy for values of [, close to 0.1 or 0.9, but appear to be reason- 
ably accurate for moderate values of J, , say between .3 and .7. In 
this range, such limits may be useful in conjunction with the binomial- 
gamma model, as indicated below. A transformation of multinomial 
proportions analogous to the angular transformation of the binonial 
might be useful here to reduce the dependence of these variances on the 
unknown true proportions and thereby to permit more exact variance 
expressions. No such transformation yet seems to have been developed. 

Computing confidence limits for a ec.d.f. derived from the distri- 
bution f(m) depends, of course, on the complexity of the model chosen. 
If f(m) is assigned a binomial form, we merely have to find limits for 
x = 6q, the parameter in the binomial distribution of v. The estimate 
# is based on Nr trials, where N denotes the number of concentrations 
tested and r the panel size. Since Nr will almost surely be greater than 
50, confidence limits for may be given closely enough by 


a(1 — #) 
#22, 
where z, is the appropriate normal deviate. Dividing these limits by q¢ 
yields lower and upper limits for @ (the latter may be unity if q is as 
small as.}). Since n, , the number of chance judgments, increases in 
probability as a direct function of 6, a lower (upper) limit to @ provides 
an upper (lower) confidence belt for the c.d.f. of the log threshold. 
Under the binomial-gamma form of f(n), the problem becomes much 
more difficult. Recalling that the mean of the gamma variable, 6, is 
equal to (a + 1)/8, and that small 6, implies a value of n, approaching 


4 
| 
6 
| 
{ 
| 
i 
: 


MEASURING THRESHOLD TASTE 257 


NV, one might expect that increasing 6 and reducing a + 1, within the 
bounds of the goodness-of-fit test, would yield a lower confidence 
bound for the e.d.f. by raising the probabilities of large values of n, . 
Conversely, reducing 8 while increasing a + 1 should raise the prob- 
abilities of smail values of n, and thus produce an upper confidence 
curve for the c.d.f. 

These procedures appear to work as expected. Using the ferrous 
sulfate test as an example, the lowest curve in Figure 1 was obtained 
by increasing 8 to 1.20 while reducing a + 1 to .20, yielding a value of 
10.7 for —2 In d (x’o5 with 5 d.f. = 11.1). The uppermost curve resulted 
from a reduction in 8 to .23 and an increase in a + 1 to .60, for which 
—2inA = 108. 

These two curves represent approximate 90 percent confidence 
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belts. For example, if the population of thresholds being sampled 
were that arising from repeated use of the panel in this ferrous sulfate 
test, we may be 90 percent sure that at least 8 percent but no more 
than 64 percent of such thresholds would fall at or below 4° or 1 ppm. 
Such wide limits are not very satisfactory for administrative purposes 
and emphasize the extreme heterogeneity of response to this particular 
compound and solvent. Ninety percent confidence intervals for the 
e.d.f. of zine chloride thresholds are substantially narrower, as would be 
expected under the simple binomial form of f(n). 

Laborious trial-and-error calculations are required to obtain confi- 
dence limits under the binomial-gamma model. The nonparametric 
method offers a good alternative procedure in this situation. For 
example, using multinomial variance and covariance formulae, the 
approximate standard error of the estimated c.d.f. at 1 ppm ferrous 
sulfate is .168. Adding and subtracting 1.65 (.168) to the binomial- 
gamma estimate .354 (Table 5) yields 90 percent confidence limits 
almost identical with those given above. In general, one might proceed 
by selecting a concentration point (near the middle of the series) where 
the nonparametric and binomial-gamma c.d.f. values are close. The 
standard error at this point may then be estimated under the former 
method and confidence limits obtained by adding and subtracting a 
normal multiple of this error to the binomial-gamma estimate. Finally, 
curves through these limits parallel to the estimated c.d.f. may be 
drawn to yield confidence belts sufficiently accurate for practical 
purposes. 

Summarizing, the simplest parametric method calls for applying a 
binomial model to the distribution of incorrect identifications, provided 
the data permit this simple form. If not, a compound binomial-gamma 
model will probably show a satisfactory fit and from this distribution 
an estimated c.d.f. of true thresholds may be obtained. In this case, 
however, the nonparametric method offers the simplest way of comput- 
ing confidence limits. 


Test design and panel size 


We might consider briefly the possibility of narrowing the distance 
between confidence bounds by reducing p, the chance of correct assign- 
ment at a concentration below the threshold, or by increasing the panel 
size r. A problem often arises here since reducing p requires extra time 
to prepare and examine additional amounts of material at each con- 
centration and may force a reduction in the number of subjects which 
can be accommodated during a given time period. 

The binomial model provides a simple criterion for judging the 
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efiects of changes in p and r. In this case, narrowing the distance 
between confidence bounds for the ¢.d.f. nexns narrowing the confidence 
interv' for 6. The width of this interval is 


where gq = 1—p. Assume anew panel composed of r’ members under an 
experimental design p’, but characterized by the same value of @ as 
an original panel, i.e., composed of individuals whose average threshold 
is identical with that of the earlier group—an obviously important 
proviso. Then the relative width of the new interval compared to the 
original is, say, 


Consider an example in which an increase in p is desired in order 
to speed completion of a subject’s test series. Suppose an increase in 
p from 1/6 to 1/2 saves at least 30 percent in the time necessary to 
conduct a single test. Suppose, further, that this saving would allow 
us to enlarge our panel by 1/3, while maintaining equal average sen- 
sitivity. Are these changes justifiable in terms of the precision with 
which the c.d.f. may be estimated? 

Using the above criterion with g/q’ = 5/3 and 


r/r’ = 3/4, (W’'/W) = 1.28(- — 50/6 
For all possible values of 6, the c.d.f. would be estimated less precisely 
under the new plan (p = 1/2) than under the original design. In fact, 
for 6 > .5, the panel size would have to be more than doubled before 
equal or improved precision would be obtained. 

The writer wishes to acknowledge his debt to the referees for their 
helpful comments and advice. 
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AN UNBIASED SAMPLING AND ESTIMATION PROCEDURE 
FOR CREEL CENSUSES OF FISHERMEN’ 


D. S. Rosson 


Corneil University, 
Ithaca, New York, U.S.A. 


INTRODUCTION 


The management of many sports fisheries is partly dependent upon 
information obtained through a creel census of the fishermen. Various 
types of data are collected in a creel census, including number of fish 
caught, amount of fishing time expended on the catch, proportion of 
marked or tagged fish in the catch, and morphometric data on the 
captured fish. The primary objective, however, is ordinarily con- 
sidered to be the estimation of fishing mortality, or total number of 
fish removed by fishermen, since information on proportion of marked 
fish and on the morphometric characteristics of the fish population is 
obtainable by other, more efficient field methods. We shall, therefore, 
restrict our attention here to the problem of estimating total catch and 
total fishing effort. 

A complete census of fishermen over a lake or stream is almost 
impossible to obtain due to practical limitations on the number of field 
personnel; consequently, creel censuses are usually designed as sample 
censuses. A ratio-type method of estimation frequently employed to 
estimate the total catch consists essentially of applying an estimated 
catch rate (number of fish caught per fishing hour) to the estimated 
total effort. This method of estimation exploits the facts that number 
of fish caught per fisherman, as measured by individual fisherman 
interviews, is positively correlated with the number of hours fished, 
and that the total number of hours fished by all fisherman can be 
estimated from easily obtained counts of the number of fishermen 
present at randomly chosen times during the day. The sampling 
design to be described here is constructed specifically to provide the 
data for an unbiased ratio-type estimate of total catch. Such a design 
will, of course, also provide the data for the unbiased estimation of 
total fishing effort and of the sampling error variance of the estimators. 


1Prepared in connection with research sponsored by the National Science Foundation. 
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DESIGN OF THE SAMPLE 


The population of fishermen present on the given lake or stream 
through the fishing season is considered to be stratified through space 
and time. The total area of the fishery is partitioned into N geographic 
segments in such a way that each segment supports approximately the 
same amount of fishing. A fishing season is stratified into weeks, and 
the weeks further stratified into three periods consisting of the five 
week-day period and each of the two weekend days. Fishing pressure 
varies systematically through the season and is much heavier on week- 
ends than on weekdays; the stratification through time removes this 
systematic variation from the sampling error variance. Week-day 
holidays would also be separated into individual strata. 

Within a given week-day period including, say, D days, the creel 
census is to be conducted on each of d randomly selected days. A crew 
of n enumerators conducts the census on n randomly selected area seg- 
ments, each enumerator spending the entire day in his assigned area 
keeping a continuous record of the number of fishermen present and 
recording each man’s catch from the area. The n sample segments are 
selected without replacement but independently on each of the d days. 

Counts of the number of fishermen present in the entire fishery are 
to be made by an (n + 1)st man on each of the d selected census days. 
His procedure is to traverse the fishery systematically but starting at a 
randomly chosen place and a randomly chosen time, counting all 
fishermen as he proceeds. A relatively rapid means of transportation 
is employed for the counting trip so that K complete circuits of the 
fishery could be made by continuous travel throughout the fishing day. 
For the sample, k of these A possible starting times are randomly 
selected without replacement. If fishing pressure is known to be 
unevenly distributed through the fishing day then stratification of this 
sample of starting times should be used; for example, the day might be 
partitioned into a morning stratum and an afternoon stratum, with a 
sample of k, being chosen from the K, possible starting times in the 
morning and k, from the K, possibilities in the afternoon. 


SYMBOLIC DESCRIPTION OF THE POPULATION AND SAMPLE DATA 


The N area segments of the fishery are regarded as being numbered 
from 1 te N so that a random sample of n segments is obtained 
by randomly choosing a combination of n integers from the first NV 
integers. Such a combination of n integers will be denoted generically 
by the symbol J, ; there are then (*) different subsets J, of the set 
(1, --+ , N), each of which is equally likely to be chosen for the sample. 
Similarly, 7, will be used to denote a combination of d integers chosen 
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from (1, --+ , D) and H, denotes a subset of k integers from the set 
(1,---+,K). A set notation of this sort is required for a precise descrip- 
tion of samples from a finite population. 

The information being sought for a particular portion (stratum) of 
the fishing season includes the total number Y of man hours fished and 
the total number ¥ of fish caught. In terms of the population structure 
defined above, 

D N 

X= 

t=1 
where X,; is the total number of man hours fished on the 7’th day at the 
jth segment and JV is similarly defined in terms of the corresponding 
Y,;.. The number of man hours X,; may, in turn, be expressed as the 
integral of a counting function ¢;,;(1) over the entire day, where c¢,;(é) 
is the number of fishermen present at time ¢.in the j’th segment on day 7. 
As indicated in Figure 1, ¢,;(1) is a step function with discontinuities 
at the points in time when fishermen enter or depart from the stratum. 
Time is most conveniently measured from an origin defined by the 
start of the fishing day, and the length of the fishing day is assumed to 
be an integer (= A) multiple of the number of hours required, say 8 
hours, for the completion of a counting trip; consequently, 0 < t < KB. 


Tt T T +— T 
to | tg to to | 
0 B 2B 3B 
FIGURE 1. : 
AN EXAmPpLe ILLUSTRATING THE FoRM OF THE CoUNTING FUNCTION ¢;;(t) 


Since X,, is the area under the graph of ¢;,(f), 


K3 
Xi; = c,,(t) dt. 


lit, , +++, 4, are the points in time when fishermen enter or leave the 
seginent, then 


where t = 0, 6... = AB, and where c,,(f,) is the number of fishermen 
present during the interval from f, to (,., . 

The information assembled in the sample creel census conducted 
by the n enumerators includes, for each of d randomly chosen days, the 
elfort X,; and the catch Y,; in n randomly chosen sample segments. 
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This may be rephrased in terms of the set notation defined earlier to 
state that X,; and Y,,; are observed for each 7 and j such that 7 belongs 
to I, and j belongs to J‘. 

Also included in the sample are the fisherman counts made on k 
counting trips during each of the d census days. The k starting times 
are selected at random and without replacement from the possible 
times i = 0,¢ = 8,1 = 28, --- ,t = (K — 1)8. A counting trip during 
the h’th time interval on day 7 will reach the j’th area segment at some 
time t, (h — 1)8 < t < h@, resulting in an observed count ¢;;,. The 
total number of fishermen observed on all k trips o day 7 may then be 
expressed as 


i=l 


ESTIMATION OF TOTAL FISHING EFFORT 


A counting trip starting at time (h — 1)6 on day 7 is begun ata . 
randomly chosen place on the route; this ensures that the j’th area 
segment is ‘‘equally likely” to be visited at any time during the interval 
from (hk — 1)8 to h8B. The time ¢ at which the j’th segment is counted 
is therefore a uniformly distributed chance variable on this interval, 
and the count c;,,(é) is a chance variable having an expected value of 


for a fixed 7,j andh. This function is, except for the factor 1/8, the 
area under the graph of ¢;;(¢) between (h — 1)8 and h§; hence, 6c;;, is 
an estimate of the total man hours fished, say X,;, , in the j’th segment 
during the time (h — 1)8 to h8 on day z. The starting time is also 
chosen at random, however, and the expected value of the estimate 
= over all possible starting times is 


h=1 
Consequently, an unbiased estimate of X,; is 


k heH, 
and an unbiased estimate of the total effort X; on the 7’th census day is 


ieJn‘* itJN=n 


Finally, since the d census days were also randomly selected, an un- 
biased estimate of the total fishing effort X over all D days is 
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ela Ta 
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A noteworthy feature of the mechanies of this estimation procedure 
is that the counting man, himself, need not record his count by area 
segment if the time at which he reaches each of the n sample segments 
being censured is recorded. The continuous records of the census 
enumerators will give the counts ¢,,, at the recorded times of visit, 
and subtraction of these from the count man’s total will give the desired 
sum of the c,;, over the V — n segments numbered in Jy", 

Sampling error in X is seen to arise from several sources correspond- 
ing to the several stages of sampling. T irst, there is the error in Be; ;, 
as an estimate of X,;, ; second, ignoring this first error, there would 
still be error in 


as an estimate of X,; ; and finally, ignoring these errors, there would 
still be error in 


D 


as an estimate of X. The presence of these three kinds of error in the 
error of the estimate X — X may be illustrated algebracially by the 
identity 


He 


xu) | + (2 - x) 


Hy) la 


= > (Xin — 


Ta HG) 


Ta 


The error variance of X, Var (X) = E(X — X)’, is, likewise, made 
up of three components since the three error of estimate components 
defined above are uncorrelated, and each has mean 0. Thus, 
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The third component of variance, due to fishing pressure differences 
between days, is easily seen to have the familiar form 


The second component V, , due to variation among periods within 
days, takes a more complicated form 


Ta 


i=1 i=1 ivi’ 


where o{; is the variance among periods in the 7’th day and j’th segment 
X ijn K 


and ¢,,,;, is the covariance between the number of hours fished per 
period in the two segments j and j’ on day 2, 


K 


1 1 1 


The first component V, , due to variation within a 6-hour period, is 


Ta Jn—nl 


_ DK(N —n) 2 —-n-1¢ | 


t=1 h=1 ini’ 


where o7,, is the variance, through time, of the number of fishermen 
present in segment j between the times (h — 1)8 and h@ on day 2, 


A(B) 
= B ci; (t) dt — 
(h-1)8 
Notice that 
K N Kg N K 
h=1 j=1 i=1 j=1 he=l 


The covariance o;;;-, between the fisherman counts in segments j and 
j’ in the h’th interval of day 7 is dependent upon the distance between 
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the two segments in terms of travel time. If 8,,;- is the amount of time 
required to travel from segment j to segment j’, then 


(h-1)B+Bj;° 


B c,:{t 4- B dt 


{h-1)8 


np 
— Bir dt — 
The first integral accounts for the cases where the counting trip is 
started at a point on the route from segment j to segment j’, and the 
second integral covers the cases in which the trip is started between 
segment j’ and segment j. 
Jstimation of Var (X) may also be carried out in essentially three 
steps corresponding to the three components of error variance. Firstly, 
the statistic 


Af 


is an unbiased estimate of Var (X) — d(V, + V,)/D; and secondly, 
the statistic 


K K 


h=1 hel 
1 
K ( > X ;;) + ij 


is an unbiased estimate, term for term, of the component V,. Tor 
computational purposes, ts estimator may be collapsed to 


> DIK(N — n)(k — { | > 


however, the correspondence between the terms of the estimate and the 
terms of the parameter V, are lost by this reduction. The within-period 
component of error variance V’; may be estimated in several ways 
without bias but with varying degrees of precision and corresponding 
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varying degrees of computational difficulty. The preferred, more 
precise estimate is 


heal 
N K (h-1)8+B; 5" 
h=1 (h-1)8 
nB 
+ 8B — B;;-) dt — 
(h-1)B+B; 


Since the functions c,;(t) are step functions, all integrals in this estimate 
may be expressed (and computed) as summations. The covariance 
term in V, involves dn(n — 1) integrals, each of which essentially 
requires the construction of two graphs, a graph of the function 
+ 8B — B;;-) and a graph of — 8;;-). A graph of 
c.;-(t)ce,;(t + 8 — B,;-) may be constructed by first superimposing the 
graph of c,;(t) on the graph of c,;-(t), with the graph of c;;-(t) translated 
8 — 8;;- units to the right as illustrated in Figure 2. The graph of the 
product is then easily obtained as shown at the bottom of Figure 2 and 
the area under the graph computed in a straightforward manner. A 
graph of c,;-(t)¢e;;(4 — 68,;-) is obtained in a similar manner, translating 
c:,;-(t) to the left 8;;- units. 


t 
° B B+, ii’ 2B 3B 
FIGURE 2. 
ILLUSTRATIONS OF THE STEP FuNcTIONs ¢; ;(t) 
The top figure shows the graph of ci;(t) superimposed on the graph of cis (t) with the latter trans. 


lated 8 — By; units to the right. The lower figure plots the product of these two functions in their 
translated position, giving cis’ (t)cis(¢ + 8 — 833+) for the intervals (h — 1)B to (h,— 1)8 + By;" . 
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If n is large, then the number of such graphs, being of the order n’, 
becomes prohibitive. An alternative estimator which utilizes the 
counts ¢,;;, made by the counting man to estimate the covariance term is 


D*K(N — { _< 2 
Vi= akn B [ c;;(t) dt X iin 


This statistic is also an unbiased estimator of the within-period com- 
ponent of the error variance, but is subject to a much larger sampling 
error than the estimator V, . 

An unbiased estimator of the sampling variance of X may now be 
formed by combining the three statistics Sy , V7, and V, into 


Ver (X) = Sx + + 


In addition, an estimator of the between-day component of the sampling 
variance is available now in the form of 


D-d,o 
V; = Sx — D (Vv, + V.). 


The purpose of estimating sampling variance is, of course, to provide 
some measure of the precision of the estimates X. Ordinarily, this 
measure is provided by the confidence interval 


2VVar 


based upon the assumption that X is approximately normally dis- 
tributed. Variance component estimates also permit the experimenter 
to estimate the change in precision he could expect by changing the 
sampling rate at any or all stages of sampling process. 


ESTIMATION OF THE TOTAL CATCH 


The purpose of constructing an unbiased estimator X; of the total 
effort X,; on the 7’th census day is primarily to provide the data for a 
ratio estimate of the total catch on day 7. The observed catch-rate in 
the sample of n area segments, multiplied by the estimate X, , forms an 
estimate of the total catch Y,; . This ratio estimate of Y, is biased, 
however; and, as pointed out by Goodman and Hartley [1], the bias 
may be substantial. An alternative, unbiased ratio-type estimator has 


ib 
4 
|: 
| 
oe 
be 
§ 
| 
| 
| 
| > 
| 
5 
q | 
| 
| 
q 


270 BIOMETRICS, JUNE 1960 


been constructed by Hartley and Ross [2] for the special case where the 
X;‘is known; in the present notation, this Hartley-Ross estimator has 
the form 


Y;; N 


n sr Xi; Je Jn) 


Since our estimator X; is unbiased and is, furthermore, statistically 
independent of the mean ratio 


then the estimator 


Jn 


is also unbiased. An unbiased estimate of the total catch Y for the 
entire D days is then 


alS 


Again, the error of estimate may be expressed as a sum of uncorrelated 
error components with the result that sampling error variance is also 
made up of variance components. Firstly, we observe that 


+(2 Ey, - 
d Ta d Ta 
or, letting = D,, , then 
D 
Panta 20%. +f ¥). 
Ta 
Next, recalling the form of the Hartley-Ross estimator Y,; , we see that 
N 
ait i=l 


or, since X;, = X;,; for all j belonging to J{", 
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Finally, from the previous section, we have 


so that 


D 
+= (X,; — X;;) 


+9 - F). 


The four components of the error of estimate each have an expected 
value of 0 and, while they are not statistically independent, they are 
uncorrelated with one another. Consequently, the sampling variance 
of ¥ is expressible as the sum of four components, say 


Var (7) 


The last component C, is simply 


C,=E(¥ yy - Al v.) |. 


t=1 i=l 


and, as in the previous section, this component is ultimately estimated 
from the corresponding statistic 


D(D — 
Sy = “‘d(d — 1) [= 
which is itself an estimate of 
E(Sy) = Var (Y) — +C;+C,). 


Thus, once the estimates C, , C. and of become available, then C, may 
be estimated from Sy by 


4640). 
The third component is expressible as 
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where Var (Y;) is the variance of a Hartley-Ross ratio estimator for 
the z’th day. Goodman and Hartley give this variance in its limiting 
form as 


lim = 1 (Y,,) + R? Var (X,,) — 2R; Cov (X,; , 


+ n 1 [Var (X,;) Var (r,;) Cov (X,, 


where R; is the mean value of the ratio r;; = Y;;/X;; in the population. 
The exact variance of the Hartley-Ross estimator for the case N finite 
has been computed by Robson [3] and expressed in terms of multivariate, 
multipart cumulants which Tukey [4] calls polykays. Polykays are 
denoted by a set of vectors enclosed in square brackets, with the 
(integer) elements of the vectors representing the degree of the cumulant 
in each of the variables. In the present case the vectors contain three 
elements corresponding to the degree of each of the three variables 
X,;, Y;; and r;;. For example, the first cumulant of r;; is expressed as 
R = [((001)]’, the variance of r;, , Var (r,;) = (ri; — — 1), 
is denoted by [(002)]’, and the covariance between X,; and r;; is written 
{(101)}’.. For the purpose at hand we shall merely use polykays as a 
convenient notation for reducing very tedious and uninteresting formulas 
into a compact form; the reader is referred to the cited papers by Tukey 
and Robson for the details of the algebra involved. The variance of 
Y, as expressed in terms of polykays is 


VW =") + — 2{(001)(110)’ 


Vin — 1) + jaonaony} 


The subscript 7 has been omitted in this formula. Each term appearing 
in the formula represents a rather formidable polynomial function of 
the moments, as will be seen below in the estimation formulas; for 
interpretive purposes, however, it may be noted that as N gets large 
this variance approaches term by term to the limiting value given 
earlier. 

An unbiased estimate of Var (Y;) is obtained by substituting into 
the above formula the unbiased estimates of the polykays. Computing 
formulas for these estimates are given below. All subscripts have been 
omitted, and sums are understood to extend over the set J{"” of sample 
~egment numbers; in addition, the product n(n — 1) --- (n — S + 1) 
ix abbreviated to (n)s . 
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7" 
[.001)(001)(200)] = ia Yr 


vy’ 
> X yr + (> XY 
+40 xX DY 


((001)(110] = (a -1) &xy 
+X 


+ 22-1) UX 


+ 9"). 

These same expressions with n replaced by N and with sums extending 


over the entire range, | to N, define the population polykays appearing 
in the formula for Var (¥,;). With these computing formulas an un- 


((101)(101)] 


hiased estimate Var (Y,) is obtained for each of the d sample days to 
give the following unbiased estimate of C; 


D 


The second component C, of error variance is expressible in terms of 
the between-period variances and covariances o;;;- defined earlier 
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for expressing the variance component V, ; thus, 


c, = DK(K 0) 


~ dknN(N — 1)(N — 2) ru) 


Feet] 
+(N-n—- p> — (N — 2n) 


= DEK — — n-1 < 


i’ 


é 2 
+ 2 > 


— 1)(N — 1) | 


An unbiased estimator of C, , which is unbiased term by term according 
to the second formula for C, , is 


D’K(K — —n) 


(LAM 


N-n-1 2 2 
+ (n — 1)(n — 2) p> 


where all sums extend over the set J‘ for day 7. Computing formulas 


were given earlier for the terms }-0?; and )o;;;- ; the only additional 
formulas now sage are 
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where, again, all sums extend over J{'’ unless otherwise indicated. 
The first component of error variance, 


is expressible in terms of the variances o7,, and covariances o;;;-, within 
periods as 


| 


n-1 
+= 


A term by term unbiased estimator of C, is then 
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where all sums extend over the set J{° for day 7. Notice, for computing 
purposes, that the sum over the index h may be performed first; for 
example, 


K K 
h=1 Jnf*) J, h=1 


Kg K 
0 h=1 


The availability of estimators of the four components of error variance 
now permits the estimation of Var (Y) in the form of Var (Y)=C,+ 
C, + C3 + C,, and the effect on sampling error variance resulting from 
varying the sampling rates may now be estimated. 


DISCUSSION 


The sampling and estimation procedure described here has not been 
field tested to confirm its workability, and though the plan was derived 
in consultation with fishery biologists at Cornell University, we recog- 
nize that it contains some serious practical disadvantages and limitations. 
Creel census enumerators stationed in well defined area segments are- 
required to record the time of entry and exit of each fisherman using the 
segment, and also to record each fisherman’s total catch from the seg- 
ment. These requirements limit the size of an area segment which an 
enumerator can effectively patrol and render the plan impracticable in 
situations where fishermen continuously move about, entering and re- 
entering the segment. If, on the other hand, the turnover of fishermen 
in an area segment is very low then the census taker will be standing 
idle for a large part of the day, with perhaps an overwhelming flurry 
of activity at the end of the day. This inefficient use of the time of field 
personnel represents what is probably the chief disadvantage of the 
present scheme as compared to the conventional type of creel census 
in which enumerators themselves move about freely, contacting as 
many fishermen as possible. Unfortunately, while it is intuitively 
clear that moving enumerators are more efficient, it is not intuitively 
clear how the data they collect should be utilized, and, in fact, as yet 
there does not exist an efficient or even satisfactory method of esti- 
mation associated with this efficient data collecting process. 

The proposed plan deliberately sacrifices this efficiency of operation 
in the field for the purpose of establishing complete objectivity in the 
data and permitting the efficient use of the data in estimation procedures. 
The unbiased property is the main advantage of the scheme; it is un- 
biased in the statistical sense that, given the data called for in the 
sampling model, the error of the estimate has theoretical expectation 
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equal to 0, and it is unbiased in the sense of interviewer and respondent 
bias since none of the information called for is dependent upon the 
memory or attitudes and opinions of either party. While unbiasedness 
in itself is not considered an essential characteristic of an estimator, 
sampling plans which permit unbiased estimation are certainly pre- 
ferable to those which incorporate unknown and subjective biases into 
all methods of estimation. Under the present scheme it is not unlikely 
that biased ratio estimators of simple construction would serve the 
purpose equally well, but only because the sampling plan is such as to 
render the parameters of the population estimable. 

Another apparent disadvantage of this scheme is the complexity of 
the computational procedure. The bulk of this difficulty centers on 
variance estimation and only a part of this labor can be blamed on the 
unbiased property of the estimator; the difficulty is due largely to the 
existence of several stages in the sampling model and is an inevitable 
consequence of the nature of the creel census problem. We are esti- 
mating the cumulative effect of a process (fishing) which operates 
ihrough two dimensions, time and space, and are forced to sample in 
both of these dimensions. Under these circumstances, variance esti- 
mation cannot fail to pose an awkward problem. 

Vishery biologists in general recognize the importance of attaching 
a measure of accuracy to point estimates, and strive always to devise 
sampling procedures which allow the estimation of error variance. The 
availability of estimates of error variance and its components may 
therefore be considered a second advantage of this procedure, stemming 
from the unbiased property. A disadvantage of the present scheme in 
this respect is that estimability of error variance requires minimum 
sample sizes of d > 2,k > 2andn > 4; that is, 4 or more census enumer- 
ators are required to work 2 or more of the D days, and the counting 
man is required to make 2 or more trips on each of these 2 or more days. 
If the fishery is large enough or if several fisheries in an area are being 
surveyed then the use of 5 or more field men may be justified; their 
effort, in this case, could be distributed so as to keep them occupied 
during the entire D-day period with their days randomly allotted among 
the several fisheries or among strata of the large fishery. 


REFERENCES 


{1] Goodman, L. A. and Hartley, H. O. [1958]. The precision of unbiased ratio-type 
estimators. Jour. Amer. Stat. Assoc. 58, 491-508. 

[2] Hartley, H. O. and Ross, A. [1954]. Unbiased ratio estimators. Nature 174, 270. 

[3] Robson, D. &. [1957]. Applications of multivariate polykays to the theory of 
unbiased ratio-type estimation. Jour. Amer. Stat. Assoc. 52, 511-522. 

[4] Tukey, J. W. [1956]. Keeping moment-like sampling computations simple. 
Ann. Math. Stat. 27, 37-54. 


x 
y 
d 
7 
ly 
ly 
— 
et 
on 
the 
es 
ra 
the 
i 


THE MATHEMATICAL THEORY OF BIOLOGICAL ASSAY 
OF A LOCAL ANAESTHETIC 


N. K. CHAKRAVARTI 
Defence Research Laboratory (Stores), Kanpur, India. 


INTRODUCTION 


No satisfactory method exists for biologically assaying the potency 
of a local anaesthetic. For long, rough estimates were being made by 
comparing threshold concentrations. The first attempt at formulating 
a satisfactory method was by Chance and Lobstein [1944] who suggested 
a method based on the degree of anaesthesia. Their method consisted 
in applying 5 stimulii in 5 minutes on the anaesthetised eye of a test 
animal, and noting the number of responses (absence of perception). 
Then a straight line is fitted between the probit of percentage response 
and logarithm of concentration. Biilbring and Wajda [1945] suggested 
another method which is essentially similar to that of Chance and 
Lobstein. They, in the intracutaneous wheal method, applied six 
pricks every five minutes up to a period of 30 minutes inside the wheal, 
and noted the percentage response among the thirty-six pricks applied. 
A straight line relationship was then plotted between the percentage 
response and the logarithm of concentration. No theoretical justi- 
fications have been advanced, either by Chance and Lobstein, or by 
Biilbring and Wajda for assuming these types of relationships, and their 
methods remain essentially arbitrary. Young [1951] takes duration 
of anaesthesia rather than the degree of anaesthesia as the criterion, 
and uses a linear relationship between the logarithm of the duration 
time and the logarithm of the concentration of the local anaesthetic 
applied, suggested, also arbitrarily, by Sinha [1936, 1939a]. Other 
authors, including Sinha [1939b] have used a linear relationship between 
the duration of anaesthesia and the logarithm of concentration. 

The many forms of relationships used and the essentially arbitrary 
nature of all of them have led Gray and Geddes [1954] to state that 
there is a confusion of methods and no one knows what is appropriate. 
A theoretically sound method that can clear this confusion is necessary 
so that the vast accumlation of data from research laboratories through- 
out the world can be compared and put to use. 
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MODE OF ACTION 


In a review paper, Gray and Geddes [1954] have presented an 
account of the present state of knowledge as regards the mode of action 
of local anaesthetics. A summary of the relevant portions from their 
paper is presented here. 

Chemically, local anaesthetics are weak bases, which have to come 
in contact with nerve-fibres as free base so as to be effective. Water 
solubility of free local anaesthetic base is low, and therefore, they are 
used in the form of soluble salts which, therefore, are a combination of 
a weak base with a strong acid. Addition of alkali precipitates frees local 
anaesthetic base. When soluble local anaesthetic salts are injected 
into a medium, the extent of precipitation depends upon the alkalinity 
pH and concentration of the medium. Free base is precipitated more 
readily from a strong than from a weak solution. Precipitation depends 
upon the type of the local anaesthetic also. 

Injected solutions, provided they are not too acidic, rapidly become 
alkaline, since body tissues have a considerable buffering capacity. 
Active base is, therefore, liberated from solutions of local anaesthetics 
on injection into the tissues. 

Local anaesthetics are more soluble in lipoids than in water. Nerve- 
tissue being rich in lipoids, local anaesthetics easily enter into the 
lipoid rich plasma-membrane, which is a specialised membrane of 
lipoid and protein perhaps only two molecules thick. There is a polar 
association between the amino group common to local anaesthetics 
and polarised lipoids in the plasma membrane. 

In the resting state, the interior and exterior of the plasma membrane 
are at different potentials. The interior is at a negative potential to 
the exterior. The demarcation potential is due to an asymmetrical 
distribution of ions on either side of the plasma membrane. This, in 
turn is accounted for by (a) the permeability characteristic of the 
membrane, and (b) an active mechanism extruding sodium ions. 

Potassium and chloride ions easily permeate the plasma membrane, 
but not so the sodium, amino acid and protein ions. During activity, 
the sodium ions freely permeate the plasma membrane, due to acti- 
vation of a carrier mechanism, and to the activity of some transmitter 
(acetylcholine) liberated in the nerve fibre. Spread of electric current 
from the neighbouring active area triggers off the carrier mechanism, 
or the production of the carrier substance. When sodium ions first 
enter into the nerve fibre, potassium ions are not immediately extruded. 
Thus the interior of the fibre becomes electropositively charged in 
relation to the exterior—a state of reverse polarisation. There is now 
a migration of potassium ions outwards and the potential difference is 
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neutralised. During recovery, extrusion of sodium ions takes place 
and potassium ions re-enter the axon. The resting state of polarisation 
is thus restored. The changes in potential thus caused trigger off 
similar changes in the neighbouring area of the axon, and the stimulus 
is carried on. Local anaesthetics prevent the ionic migrations necessary 
for the conduction of the stimulus. 

It has been found that a stimulus applied at a nerve fibre causes 
the release of acetylcholine. Liberated acetylcholine, through the 
catalytic action of an enzyme (cholinesterase), reacts with phospho- 
creatine and thus triggers off the adenyl-phosphate chain reaction. 
Because local anaesthetic molecules have structural similarities to 
acetylcholine, they compete with acetylcholine for the specific enzyme 
(cholinesterase) concerned in the chain reaction, and thus interfere 
with the propagation of the stimulus. 


THEORY 


Assume that when a stimulus is applied, », molecules of acetyl- 
choline are released in the lipoid (n, of course depends on the species of 
animal and site and area of application). Consider, apart at first from 


the stimulus, what happens when a local anaesthetic is applied. 
Let 


fractional concentration of the local anaesthetic (0 < c < 1), 
number of molecules of the local anaesthetic base per unit volume, 
n/c, and 

number of molecules of local anaesthetic base precipitated per 
unit volume. 


According to Gray and Geddes, as stated in the second paragraph 
of the mode of action, precipitation depends upon concentration, the 
precipitation being more complete the stronger the solution. This 
phenomenon can be explained by precipitation being an exponential 
function of the concentration such that, at low concentration, the 
precipitation will be still smaller and the precipitation is complete at 
100 % concentration. Assume that the required function is 


where v’ is the number of the local anaesthetic base molecules contained 
per unit volume of a 100 % strong solution, and 8 and #’ are positive 
constants such that 8 = \@’. Of course, v’/A = Cioo, = 1. 

As time elapses, the concentration of the local anaesthetic falls; 
over time ¢, the relation 
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fo/fe = (2) 
will hold. 
In a catalysed unimolecular reaction (which, in effect, is what is 


happening here, with cholinesterase as thé catalytic agent), we have 
(c.f. Moelwyn-Hughes [1950]), 


log (fo/f.) = Kt. (3) 
Hence, from (2) and (3), 


log c, = log co — BC, — — Kt. (4) 


Returning now to the application of the stimulus in the presence of 
the anaesthetic, we assume that the released molecules of acetylcholine 
compete with the local anaesthetic base molecules, and that the result 
determines whether or not a stimulus is perceived. It is assumed that 
the enzyme (cholinesterase) allows only one molecule at a time to enter 
into reaction with phosphocreatine. The number and composition of 
the reacting molecules n, + f, only are of importance. It is also assumed 
that the local anaesthetic base and acetycholine molecules get thoroughly 
mixed in the lipoid so that there is a random arrangement of these 
reacting molecules in the mixture. Hence the probability of an acetyl- 
choline molecule reacting with phosphocreatine 


P, = + fr) (5) 


is the probability of perception. 

If P, is the probability of perception, QQ, = 1 — P, will be the prob- 
ability of suppression of the stimulus by the local anaesthetic i.e., the 
probability of response. Then it is easily shown that 


Q,/P. = f./n. = OC. (6) 
Taking logs and substituting from (4), we reach the 3-parameter system 
log Q./P, = log (A/n.) + log Co — BIL — Co) — Kt, (7) 


which may be rewritten, with the logit metameter Y = log (Q/P), 
as follows: 


Y, = log Cy + BC, — Kt—- a. (8) 


This equation expresses the relation between the probability of 
response, the initial concentration, and the time elapsed after topical 
application. 


In a bioassay, an unknown concentration C, is assayed against r 
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known concentrations C, , --- ,C,. Weshall consider the three follow- 
ing types of observations taken in such a bioassay. 


(i) A set of n stimulii applied at each of m selected intervals of time 
t, , --+ , 4, per concentration per animal. The relationship is then 


Y, — (log C + BC — a) + Kt =0. (9) 


(ii) A set of nm stimulii applied after one fixed interval of time ¢ per 
concentration per animal. The relationship is then 


Y;- log C;- BC, +6= 0, (10) 


where 6 is a constant. 

(iii) Duration of anaesthesia obtained by applying stimuli in succession 
so that the middle point of time between the last failure to perceive 
and the first perception is taken as the duration of anaesthesia. 


Reverting to equation (3), we note that f, ~ Oasti— o. Hence 
theoretically the effect of anaesthesia persists over a very long period. 
However, the plasma membrane being very small (only about two 
molecules thick), the number of local anaesthetic base molecules dis- 
solved in the lipoid is also small. In practice, then, there will be no 
local anaesthetic base molecules in the lipoid, after the lapse of a finite 
period of time. This discrete case can be treated in terms of the con- 
tinuous case by assuming that there is a threshold concentration below 
which the chance of perception is so small that the effect of anaesthesia 
is considered over, so that, the time to reach this concentration is 
considered the duration time. At this threshold concentration, P and 
hence Q are fixed so that log Q/P is a constant. 

The relationship (8) then assumes the form 


T; = A+BC; +7 logC,. (11) 
FITTING OF THE RELATIONSHIP 


(i) When a set of n stimulii are applied at m selected intervals of 
time i, , --- , /, , we have from relation (9) 


Y,, = (log C; + BC; —a) — Kt 


where 
= log C, + BC, a. (13) 


Now V(Y,,) = 1/nP;.Qi.. Hence the weighting factor for the response 
metameter Y,, is s/V(¥ ie., 


W,. = nsP,.Q,, , (14) 
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if there are s test animals per concentration. Considering only the ith 
concentration, 4; = const., and to estimate uy; and K, we minimise 
{ Wil(Vse — wi + Kt)? where m; intervals of time out of m show 
The estimated value of K is averaged over all the r + 1 concen- 
trations and we obtain 


r+1 
K = 2, mooi (15) 
t=1 
where 
wit) 
t Wit 
and 


mo. wit — (17) 


The values of u; corresponding to the r + 1 concentrations are then 
obtained from 


The values of a and @ are then estimated by applying the method 
of least squares and minimising 


> (ui — log C; — BC; + a)’, 


the summation being taken over the known concentrations only. The 
solutions are obtained as 


a= a’, (19) 
and 
Swe 
p= (20) 
where 
wi = wy — log, (21) 
, =—/ 
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and 


Now yu, (corresponding to C,); a and 8 are known, and hence C, can be 
found from the equation uw, = log C, + BC, — a. 

(ii) When observations are taken only after a fixed interval of time 
i, we use relation (10), and to estimate 8 and 6, we apply the method of 
least squares and minimise }-w; (y; — log C; — BC; + 6)? and obtain 
the solution as 


A at 
(24) 
and 
8,-. 
where 
yi = ys — loge, , (26) 
A — 1 = ay’. 
1 
sy. = (yi — — 4, 
wilyi — — 6) (28) 
and 


The unknown C, is then obtained from y, = log C, + BC, — 6. 
(iii) When the duration of anaesthesia 7’ is measured, we use equa- 
tion (11), and to estimate A, B and y, apply the method of least squares 
and minimise >.7_, (7; — A — BC; — log C;)’. 
The solutions are obtained as 


TTL. log C; 


r C; log C; 
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and so on. 
Then C, is solved from the equation, 7, = y log C, + BC, + A. 


FIDUCIAL LIMITS OF C 


(i) When a set of n stimulii are applied on a test area of an animal 
at a few selected intervals of time ¢, , --- , ¢, , we obtain the variance 
of B, i.e. V(8) by using the relations (20) and (18) as 

> Wit 
t=1 

The fiducial limits are obtained by solving log C, + (6 + t V V(8)) 
C, = a+ yu, where ¢ is the é-statistic based on ns + aol m,; — 3] degrees 
of freedom. 

(ii) When observations are taken only after a fixed interval of time, 
by using equation (24), we obtain V (8) as 


The fiducial limits are obtained by solving the equation log C, + 
(8 + t VV(8))C. = yw + 5; where ¢ is the ¢-statistic with ns — 2 
degrees of freedom. 

(iii) When the duration of anaesthesia (7) is measured, we have 
from (11), 


T; = A+ BC, + log (33) 


where A, B and y are constants. The constants A, B and y are de- 
termined from the known concentrations and duration times corre- 
sponding to them. Relation (11) is a multiple regression equation and 
hence the standard error of estimate of 7’ is given by 


(Tove 
see (T) = += 


Then the fiducial limits of C, are obtained by solving 7, + t. see (7) = 


A + BC, + 7 log C, where ¢ is the t-statistic at the desired probability 
level with r — 3 degrees of freedom. 


SOLUTION OF EQUATION log z + 6x =a 


The equation log x + Bx — a = 0 can be solved for z numerically 
by iterative methods (c.f. Uspensky [1948]). The method can be de- 
scribed thus: log z + Bx — a = Oor, x = (a/8) — (log z)/8. 
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The right hand side of the equation is a decreasing function of x, and 
we know the two values C,; and C;,, within which z lies by reference to 
the probabilities or duration times. 

If xo is a value within the interval (C; , C;,,), we can obtain a better 
approximation zx, thus: z, = (a/8) — (log 2)/8. 

Further improved approximations can be obtained thus: x, = a/B — 
(log 2,)/8, x3 = a/B — (log x,)/8, and so on till 7; ~ x;,, to the desired 
degree of approximation. 

A second method due to Newton (c.f. Uspensky [1948]) can be 
described thus: 

Let f(x) = log z + Br — a. 

Now choose a value x, such that f(x.)f’"(ao) > 0. Then the successive 
approximations are given by X, = 2% — f(a)/f'(to), X22 = m1 — 
f(z,)/f’(a1), and so on till z; ~ 2,,, to the desired degree of approxi- 
mation. 


EXPERIMENTAL VERIFICATION 


(i) In the case where n stimulii are applied at each of m intervals 
of time, the data reported by Biilbring and Wajda [1945] are analysed 
here. They used 6 replicates at each of 4 concentrations, and a set of 
6 pricks were applied at intervals of 5 minutes up to a period of 30 
minutes inside each test wheal. Thus there are a total of 36 pricks at 
each concentration and time interval. The summarised data are 
presented here (omitting 100 % and 0 % responses) in Table I. 

Using the data presented in columns (1) through ‘~. and equations 
(15) and (18), we obtain: 


K 0.0724, M.oi25 = 0.7281, K.o250 1.2222, K.0500 => 1.7954, 
and H.1000 = 3.1327. 


These values are used to calculate column (6). Again using equa- 
tions (19) and (20), we obtain: 


a = —2.3677, and £B = 17.1399: 


From equation (13), and using these values of a and 8, we obtain 
the calculated values of the y’s, say u*’s as 


= 0.6789 = 1.1942 
= 1.9237 = 3.0817. 


Using these values, we obtain column (7). Then columns (8) and 
(9) are calculated from columns (6) and (7) respectively. A com- 
parison with column (3) shows quite satisfactory agreement. 
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TABLE I 


Data ON THE AcTION OF LocAL ANAESTHETIC (FROM BULBRING AND 
WagpA) AND FITTING oF THE RELATIONSHIP Y; = p; — Kt. 


Time Q calculated 
Conc. in Q ) a PQ Y calculated from | from Y based on 
Min. | obs obs 


(1) (2) | (3) (4) (5) (6) (7) (8) (9) 


.0125% 5 | 0.8051 0.6160 | 0.1569 0.3663 0.3170 | 0.6992 | 0.6748 


.0250% | 10 | 0.7763 0.5404 | 0.1737 0.4986 0.4705 | 0.7591 | 0.7412 


.0500% 5 | 0.9719 1.5389 | 0.0373 1.4336 1.5619 | 0.9645 | 0.9733 


.1000% | 25 | 0.9719 1.5389 | 0.0273 1.3235 1.2726 | 0.9547 | 0.9493 


Next, for purposes of assaying potency, let us assume that the 
concentration .0500% was unknown and that we have the observa- 
tions from which it is required to estimate the potency. Using the values 
of K and yw’s in respect of the three known concentrations and putting 
them in equations (19) and (20), we obtain 


a = —2.4050 and £6 = 17.2592. 
Now, putting the values of a, 8 and wu ¢s00 , 1-€. Hunknown » 1 equation 
(13), we get 
log C, + 17.2592C, = —0.6096, 
when, solving, we obtain C, = 0.0436. Using equation (31), we obtain 
V(8) = 6.4427. V(@) is based on 468 degrees of freedom. For such 
large degrees of freedom, normal deviate can be used instead of the 


i-statistic. Thus the equations to determine the fiducial limits at 5 % 
level of probability are 


log C, + C.(17.2592 + 4.9950) 


— .6096, 


10 | 0.5000 0.0000 | 0.2500 0.0045 | —0.0448 | 0.5026 | 0.4724 en 
15 | 0.2775 | —0.4156 | 0.2005 | —0.3574 | —0.4066 | 0.3052 | 0.2817 — | 
20 | 0.0550 | —1.2272 | 0.0520 | —0.7192 | —0.7684 | 0.1603 | 0.1455 ii I 
15 | 0.5279] 0.0485 | 0.2492 | 0.1367 | 0.1087 | 0.5781 | 0.5622 
20 | 0.4169 | —0.1457 | 0.2431 | —0.2251 | —0.2531 | 0.3733 | 0.3583 came | 
25 | 0.1949 | —0.6160 | 0.1569 | —0.5869 | —0.6150 | 0.2057 | 0.1954 ali 
10 | 0.8621 0.7960 | 0.1189 1.0718 1.2000 | 0.9219 | 0.9407 com 
15 |0.8051} 0.6160 | 0.1569 | 0.7099 | 0.8382 | 0.8368 | 0.8733 
20 | 0.7224 0.4154 | 0.2005 0.3481 0.4764 | 0.6903 | 0.7497 44 
25 | 0.5556 | 0.0970 | 0.2469 | —0.0137 | 0.1146 | 0.4921 | 0.5656 | 
30 | 0.3051 | —0.3575 | 0.2120 | —0.3755 | —0.2473 | 0.2963 | 0.3614 as 
30 | 0.8887 0.9023 | 0.0989 0.9617 0.9108 | 0.9015 | 0.8906 £ 
sal 
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whence, upon solving, we obtain the fiducial limits as 0.0370 and 
0.0537. 

(iii) In this case where the duration of anaesthesia 7’ is used rather 
than the percentage response in partial anaesthesia, Young [1951] in 
his Table VII has reported duration of ccecaine anaesthesia found by 
various authors. In all those cases where data have been reported for 
4 or more concentrations, equation (33) is fitted, and the following 
equations are obtained which are presented in Table IT. 


TABLE II 
Frrrep Equations To Data oN DuRraTION TimEs OF A LocaL ANAESTHETIC. 
Source Equation 

(a) Young (General): T = 9.751 + 0.648C + 16.554 log C 
(b) Young (Sp. Exp): T = 13.750 + 0.263C + 21.524 log C 
(c) Cohen [1925]: T = 6.271 + 3.370C + 1.699 log C 
(d) Uhlmann [1930]: T = 34.142 + 1.538C + 28.530 log C 
(e) Bonet [1931]: T = 5.517 + 0.920C + 5.050 log C 
(f) Sinha [1936]: T = 4.751 + 0.678C + 6.643 log C 
(g) Sinha [1939a]: T = 2.073 + 0.279C + 3.310 log C 
(h) Young [1951] 

(Table VI) T = 9.572 + 0.886C + 14.320 log C 
(i) Young [1951] 

(Table VIII—Cocaine] T = 17.270 — 0.065C + 24.897 log C 
(j) Young [1951] 

(Table VIII Nuper 

Caine) T = 98.634 + 122.917C + 55.003 log C 


The observed and calculated values (times in minutes) are tabulated 
below in Table III. (Calculated values shown in parentheses). 

These results show excellent agreement except in the case of (i) 
where £8 is negative and in the case of (h) where the expected values 
differ considerably from the observed values. The negative value of 6 
in the case of (i) can be explained by random fluctuations only. Con- 
sidering now the observed values of (h), we note that these are log- 
arithmic averages and not arithmetic averages. Moreover, the ex- 
periments have been conducted over a number of years, and hence the 
combination of such data is of doubtful validity. 

Next, for the purpose of assay, let us assume that the concentration 
20.0 mg/c.c. in the data at (a) Young (General) is not known, and that 
the potency has to be estimated from the observed duration of anaes- 
thesia. The equation 
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TABLE III 


OBSERVED AND Firrep VALUES (IN PARENTHESES) OF DURATION 
TiMEs or A LocaL ANAESTHETIC. 


Cocaine concentrations in mg./c. c. 
Source 1.25 2.00 2.50 5.0 100 20.0 400 50.0 100.0 
(a) Young 13 17 24 33 45 70 
(General) (12.2) (18.0) (24.6) (32.8) (44.2) (70.3) 
(b) Young 23 30 38 47 
(Sp. Exp) (23.0) (30.1) (37.9) (47.0) 
(c) Cohen 10 17 23 42 46* 
[1925] (10.6) (15.4) (24.3) (41.7) (75.9) 
(d) Uhlmann 39 49 62 78 
[1930] (38.8) (49.3) (61.8) (78.1) 
(e) Bonet 14 19 31 60 
[1931] (13.6) (19.8) (30.5) (60.1) 
(f) Sinha 9 13 18 27 
[1936] (9.1) (12.8) (18.2) (27.0) 
(¢) Sinha 4 6 
[1939a] (4.1) (5.8) (8.2) (12.0) 
(h) Young [1951] (19.9) 18.8 23.7 406 53.6 58.5 114.6 131.8 
(Table VI) (15.7) (17.5) (24.0) (32.8) (45.9) (68.0) (78.2) (126.8) 
(i) Young [1951] 26.9 34.7 41.3 48.4 
(Table VIII (27.0) (34.4) (41.5) (48.4) 
Cocaine) 
(j) Young [1951] concentration in mg/c. c. 
(Table VIII— 025 .050 .100 .200 
Nupercaine) 
13.8 32.8 56.3 84.7 
(13.6) (33.2) (55.9) (84.8) 


*Estimated from a regression equation of the type log T = a — b log C. 


T=A+BC 


is now fitted to the data for the 5 known concentrations (excluding the 
unknown concentration). The equation is 


T = 9.911 + 0.662C + 15.909 log C. 


Putting 7 = 45 in this equation, we obtain C = 21.2 mg/c.c. Now, 
see (7') = 0.875. The fiducial limits of 7 at the 5% level of probability 
are 45 + 4.04 min., i.e., 40.96 and 49.04 mins. Putting these values 
in the equations, we obtain the fiducial limits of C, as 17.21 mg/c.c. 
and 25.36 mg/c ¢ 
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DISCUSSION 


The relationship log Q,/P, = log C + BC — kt — a. derived from 
theoretical considerations shows excellent agreement with the observed 
data reported by many authors. The many forms of relationships 
assumed arbitrarily are no longer necessary, and a uniform theoretically 
valid relationship replaces all the arbitrary ones successfully. 

In the procedures adopted by Chance and Lobstein [1944], and 
Bilbring and Wajda [1945] the observations are taken up to a fixed 
period only. It is observed that precision will be improved if obser- 
vations are continued till a 0-response is obtained, which can be done 
with a little extra expenditure of time and money. 

This theory shows that there is no fundamental difference between 
criteria based on duration of anaesthesia and degree of anaesthesia, 
and the connection is brought out clearly. However, the criterion based 
on degree of anaesthesia is expected to lead to more precise results, 
since the duration is a mean of two observations only, whereas, the 
degree of anaesthesia, as per Biilbring and Wajda, is based on six sets 
of six observations each; thus, for the same number of animals, many 
more observations are obtained in the case of degree of anaesthesia. 

The mode of action as described above offers an explanation of the 
observed fact that the action of a local anaesthetic depends only on the 
concentration applied and not on the amount injected. For example, 
in probit analysis, we use the logarithm of the concentration and not 
of the amount ingested or taken up. The amount of the lipoid in the 
nerve fibres at the area of application is small and fixed. The amount 
of the local anaesthetic solution that can dissolve in the lipoid is, there- 
fore, small and fixed. Hence, whatever the amount of the local anaes- 
thetic solution injected, only a small fixed amount is dissolved in the 
lipoid which alone is responsible for the action, the rest being destroyed 
or excreted without producing any anaesthetising effect (provided, of 
course, that the amount is not too heavy to cause permanent damage 
to the tissue). The amount of the local anaesthetic dissolved in the 
lipoid is thus proportional to the concentration. The concentration 
is therefore relevant but not the amount injected. 
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148 NOTE: Calculation of Inbreeding in Family Selection Studies 
on the IBM-650 Data Processing Machine’ 


K. Hoen anp A. H. E. GRANDAGE? 


North Carolina State College, 
Raleigh, North Carolina, U.S.A. 


In family selection studies it is often desirable to keep a record of 
the inbreeding of individual families. When selection is to be con- 
tinued over many generations, the calculation of inbreeding coefficients 
by pedigree analysis becomes very complicated in advanced generations. 

Emik and Terrill [1949] described a method of calculating the in- 
breeding of individuals in a given generation by using parent-offspring 
numerator relationships (= genetic covariances). A similar method 
has been used at North Carolina State College in a family selection 
study with two populations of corn. The selection procedure has been 
described by Robinson and Comstock [1955]. A program has now been 
written for calculating the inbreeding for all possible crosses among 
selected families in each selection cycle on an IBM-650 machine. The 
program applies to the following situation: 

A selection study is started by mating each of m males in a population 
to n different females. After testing the mn progenies for yield or 
production, a certain number, say p, are selected to make up a new 
population. Recombinations among the selected families are made 
in a way which insures approximately equal representation in the new 
population. Another cycle is started by obtaining a new series of 
biparental progenies in this new population. Inbreeding can be mini- 
mized by only recombining families with the lowest degree of relation- 
ship. 


1Contribution from the Departments of Genetics and Experimental Statisties, North Carolina 
Agricultural Experiment Station, Raleigh, North Carolina as Journal Paper 1064. The work was 
supported in part by the Rockefeller Foundation. ; 

*Formerly Assistant Geneticist, Departments of Genetics and Experimental Statistics and 
presently in Stichting voor Plantenveredeling, Wageningen, Netherlands; and Associate Professor 
of L-xperimental Statistics. 
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The program developed at North Carolina State College calculates 
relationships among selected families directly from relationships among 
the grandparent families selected in the previous cycle. It disregards 
relatiouships among the m males and mn females of the generation in 
hetween which are of no interest in a selection scheme of this type. 
Careful pedigree recoid, lave to be kept, however, since it is essential 
that each family can Le traced back to the individuals which were :is 
grandparents. 

In discussing the basic concepts used in the program, consider two 
selected families ? and Q with ancestries as follows: 


Family Parents Grandparents 
X A £ 
Cc 
Z E F 
Q 
W G eH 
| 


The numerator relationship or genetic covariance of X and Y, say, is the 
numerator of the coefficient of relationship as defined by Wright (1922). 
The inbreeding coefficient /’, of any member of P is equal to half the 
numerator relationship of its parents (which will be called covariance 
here): 


Fp = 3 cov (X, Y). 


It can be derived that the covariance of a random member of family P 
with «random member of Q is equal to the average of the covariances 
of the ue i-parental co:nbinations: 


cov (P,Q) = jfeov (XY, Z) + eov (X, W) 
+ cov (Y, Z) + cov (Y, W)]. 

Also 

cov (X, Z) = [cov (A, FE) + cov (A, F) 

+ cov (B, FE) + cov (B, F)]. 
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Krom this it can easily be derived that the convariance between P and 
() can be written as: 


cov (P,Q) = is [cov (A, FE) + cov (A, F) 


4 — (A, G) + cov (A, H) 

+ cov (B, E) + --- + cov (B, H) 
+ cov (C, E) + --- + cov (C, 
+ cov (D, E) + --- + cov (D, H)] 


Two other formulas used in the program are one for the covariance 
of an individual z of family P with itself, 


cov (P; ,P,) = 1+ 3 cov (X, Y) 
= 1 + } [cov (A, C) + cov (A, D) + cov (B, C) + cov (B, D)], 


and one for the covariance of two full sibs, 
cov (P; ,P;) = [1 + 3 cov (X, Y)]/2. 


Going back to the selection scheme described above, let us assume that 
a two-way table is available containing the covariances of all p(p + 1)/2 
combinations among the grandparents of p newly selected families. 
The off-diagonal table entries are covariances of any member of a given 
family with a random member of another family. The diagonal entries 
are covariances of a member of a given family with itself. The covari- 
ance of two full sibs (different members of the same family) is a diagonal 
entry divided by two. The calculation of covariances for combinations 
among the newly selected families can be done on an IBM-650 machine 
after making the following preparations: 


(a) A load card is punched for each table entry containing: 


in word 1: 000000h7jk, where Azjk is the location of the first in- 
struction of storage program for these cards, and 

in word 8: abceigxyzw, where abc and efg are two grandparental 
families, and zyzw is the covariance of a random member 
of abe and a random member of efg. Cards correspond- 
ing to diagonal entries have in word 8 a number like 
abcabcxyzw, where xyzw is now the covariance of a 
member of aaa with itself minus 1. The 1 has to be 
omitted when the covariances should contain 4 decimals. 
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(b) A load card is punched for each newly selected family with: 


in word 1: 000000prst, where prst is the location of the first instruc- 
tion of the storage program for these cards, 

in word 4: parents of the family, parent 1 in columns 31-55, 
parent 2 in columns 36-40, 

in words 5 and 6: parents of parent 1; these are two grandparents 
of the selected family, here identified as to family and 
individual, and 

in words 7 and 8: other two grandparents of the selected family, 
identified as to family and individual. 


In running the program, the information on the cards mentioned 
under (a) is stored in p(p + 1)/2 consecutive drum memory locations. 
Likewise the information on the (b) cards is stored in 5p consecutive 
drum locations. The program itself occupies 340 locations. The 
machine systematically computes the covariances of each pair of newly 
selected families by calculating the average of 16 grandparental co- 
variances as outlined above. If two grandparents happen to be identical, 
the machine will add to the covariance the 1 which is omitted on the 
(a) cards, and if two grandparents are sibs, the machine adds 1 to the 
covariance and divides the resulting number by 2. 

The machine will punch out one card for each entry of a new p X p 
table of covariance for all p(p + 1)/2 pairs of newly selected families. 
Each output card will contain the following information: 


in word 1: 000000hzjk, the same as on type (a) input cards, 

in word 2: a number corresponding to the table entry (row and 
column number), 

in word 3: year and population number, 

in word 4: parents of family S, say, 

in word 5: parents of family 7’, say, 

in word 8: ---ayzw, where xryzw is the calculated covariance of a 
raudum member of S with a random member of 7’, or, 
if S = 1’, the covariance of a random member of S with 
itself. Columns 71-76, indicated as --- are left blank 
by the machine. The identification numbers of the 
newly selected families are punched in these columns by 
hand, and the cards sorted in numerical order on 
columns 76, --- , 71 successively. When this has been 
done, the output cards can be used as input cards of 
type (a) for the next selection cycle. 


If 25 families are selected in each cycle (p = 25), the entire program 
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occupies approximately 800 drum locations, and its running time is 
20 minutes. 
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149 NOTE: On Optimum Family Size in 
Selection Programmes 


ALAN ROBERTSON 
A.R.C. Unit of Animal Genetics, West Mains Road, Edinburgh, Scotland. 


In a recent publication (Robertson, [1957]) the problem of the 
optimum structure of a breeding programme using progeny-testing or 
family selection was discussed in general terms. It was shown that by 
the introduction of a factor K (the number of animals N whose per- 
formance could be tested divided by the number of groups which would 
be selected S) a general solution could be given. 

When the groups are of half-sibs and it is known that there are no 
non-genetic between-group variations, it can be shown that the expected 
improvement is a function solely of the proportion of groups selected 
p and K/a, where a = (4 — h’)/h’, h’ being the heritability of the 
measurement on which selection is based. We have for the expected 
genetic superiority of chosen sires 


= Vip 


where z is the ordinate of the unit normal curve at the point at which 
the area cut off is p. The determination of the structure for optimum 
improvement is then a problem of maximizing AG in terms of p. At 
the maximum, it was found that 


K _ 1 2pr—z 
a 


where z is now the abscissa of the normal curve at the point of cut-off. 
It was thus possible to calculate the optimum value of p, and conse- 
quently the maximum value of AG, for different values of K/a, 
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It was found that, as expected, AG increases as K/a increases but 
also, rather surprisingly, that above a value of K/a of about 8, a given 
proportional increase in K/a produced a constant absolute increase in 
AG. ‘This relationship remained at an empirical level in the original 
paper. It is the function of the present paper to investigate further 
the nature of this relationship and to use it to consider further some of 
the problems of population structure. p is then determined by the 
above relationship 


Substituting in the expression for AG, we have that at the optimum 


AG = 


The algebraic expression of the empirical relationship would be 
AG = A + B log, (K/a). 
If this held true, then we should have 
d(AG)/d(K/a) = B(a/K). 


We can in fact find the value of the differential as a function of p, by 
evaluating d(AG)/dp and d(K/a)/dp. After much algebra we find that 


d(aG) px 
d(K/a) K p 
= (a/K)f(p). 


If we plot f(p) against p, we find that it is zero when p = 0.27 
(when 2p. — z = 0), rises to a maximum when p is in the neighborhood 
of 0.07, and then declines slowly to zero as p approaches zero. But over 
a very wide range of p, {(p) changes very little. Thus, between p = 0.16 
to p = 0.006, corresponding to K/a values from 3 to 500, f(p) lies 
between 0.285 and 0.325. It follows then that the apparent linearity 
of the plot of AG against log K/a is an algebraic accident in that the 
linearity holds merely over a wide range of K/a. This does not however 
alter the fact that over the range of K/a from 3 to 500, we may with 
reasonable accuracy write the superiority of the sires of the chosen 
groups when the structure is optimum as 


AG = og[0.5 + 0.3 log. (K/a)]. 
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There is an odd consequence of this relationship. The number of 
groups chosen each generation S is generally fixed by the inbreeding 
depression that it is thought the population could stand. Can the 
optimum value of S be determined a priori? For given values of N 
and h*, any increase in S reduces AG when the structure is optimum, 
but also reduces the total amount of inbreeding depression AP. The 
best value of S is that which gives the greatest value of AG — AP. 

Let us consider a programme of half-sib family selection in which 
we select the progeny groups of S sires in each generation. The increase 
in homozygosis each generation is 1/8S. If D is the inbreeding depres- 
sion for each unit of inbreeding, then the inbreeding depression each 
generation will be D/8S. As we deal with the selection of half-sib 
groups, then we have 


AG = 30,{0.5 + 0.3 log, (K/Sa)]. 


At the optimum, we will have d(AG — AP)/dS = 0, or —0.15(¢,/S) + 
(D/8S*) = 0, giving for the optimum value of S, 


In a progeny-testing programme, in which the greater part of the 
improvement comes from breeding young males for testing by proven 
sires from which males would be bred each generation, the inbreeding 
depression would be reduced by a factor of 4 and the genetic improve- 
ment per generation only by a factor of two, so that the optimum value 
would be twice that given above. Perhaps the reader should be warned 
against a too literal interpretation of the above formulae. Inbreeding 
depression generally affects many other aspects of an animal besides 
those to which selection is immediately directed and in any practical 
application the value of D would have to be modified accordingly. 
However, it is interesting and rather surprising that the optimum 
number of families selected is independent of the total amount of test- 
ing facilities. This will of course lose its relevance if the population is 
to be used mainly for crossing purposes so that the purebred perform- 
ance is not of great importance. 


REFERENCE 


Robertson, A. [1957]. Optimum Group Size in Progeny Testing and Family Selec- 
tion. Biometrics 13, 442-450. 


| 
5 D 
6 o, 
4 
4 
. 
4 
4 


QUERIES AND NOTES 299 


150 NOTE: On Heuristic Estimation Methods 


Matcoutm TuRNER 
Medical College of Virginia, 
Richmond, Virginia, U.S.A. 


It is quite common practiec in presenting a lecture series on general 
statistics to follow a discussion of frequency distribution by a description 
of some measures of “central tendency”, such as the arithmetic mean, 
the median, and the midrange. Then may be presented certain measures 
of “dispersion”, such as the sfandard deviation, the mean deviation, and 
the range. 

Later on in the series one may discuss the theory of estimation, 
during which discussion the method of maximum likelihood will likely 
play a prominent role. In addition some “heuristic” methods of esti- 
mation may be described. The method of least squares will certainly 
be given. Certain other methods, such as the method of minimum 
sum of absolute deviations and the method of minimum largest absolute 
deviation, will possibly be mentioned as little-applied alternatives. 
(See Whitaker and Robinsen [1944] for a brief discussion of these 
methods.) 

It is interesting that a unified presentation of all of these methods 
of estimation and of all of the above mentioned measures of central 
tendency and of dispersion may be given by reference to a single sym- 
metrical distribution. 

Let us consider the following density function: 


fa”) = (1 ‘ , (1) 


This density function is obviously symmetrical about the “center” a. 
We shall be interested in four special cases. 


Case I. VE we take y = 1, then we have the double exponential 
distribution, 


Case II. If we take y = 2, we have the common normal distribution 


where 6” = 20° anda = u. 
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Case III. If we take y = gq, a specified even positive integer, we 
have the g-th-power distribution 


—(z—a) 


Ke) = ° » (4) 


Case IV. If we let y tend toward infinity, we have the rectangular 
distribution. 


1 
= 58 » a-@S2Se+6. (5) 
Suppose that we have a sample of size N drawn from each of the four 


special cases. The log likelihoods for the first three cases are then 
given in the following table. 


Distribution Log likelihood 
double exponential —log (28)¥ — (1/8) = 
normal —log — (1/8?) — a)? 
g-th power log — log [281(1/q)]¥ — (1/82) — 


It is seen that only the last term contains the centrality parameter a. 
Hence to find the maximum likelihood estimates for a we proceed, in 
the case of the double exponential, to minimize the sum of the absolute 
deviations; in the case of the normal, to minimize the sum of the squared 
deviations (least squares); and in the case of the g-th power distribution, 
to minimize the sum of the q-th power of the deviations (least q-th’s). In 
the case of the rectangular distribution we have to minimize 
lim — a)’. (6) 
It is obvious that in passing to the limit the largest deviation becomes 
dgminant. Thus minimizing (6) is equivalent to minimizing 


sup |x al, 


providing the estimation procedure known as minimizing the largest 
absolute deviation (least largest deviation). (See Goldstein, et. al. 
[1957].) 

Application of the four principles to the appropriate distribution 
yields as estimators: the median for the double exponential, the arith- 
metic mean for the normal, the midrange (3) (x) + 2,w)) for the rectan- 
gular distribution, and an unnamed least qg-th estimator for the q-th 
power distribution. 
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Similarly, the principle of maximum likelihood yields the mean 
deviation (about the median) as estimator for 6 in the case of the double 
exponential. Tor the normal distribution, 6° is estimated by twice 
the square of the standard deviation and for the rectangular distribution 
8 is estimated by one-half of the range. 

The above results are summarized in the following table. 


Estimation 
y | Distribution | Principle for a Estimate of a Estimate of 8 
1 | double expo- | least absolute median mean deviation 
nential deviations 
2 normal least squares mean 4/2 standard deviation 
q q-th power | least q-th’s (no explicit form) (no explicit form) 
o | rectangular | least largest midrange 1/2 range 
deviation 


The author has found this kind of discussion useful in introducing 
the student to the subject of “heuristic” estimation methods. Perhaps 
others will also. 
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151 NOTE: A Genetic Application of the 
Schumann-Bradley Table 


STANLEY WEARDEN 


Department of Statistics 
Kansas State University 
Manhattan, Kansas, U.S.A. 


Tables of the ratio of central variance ratios have been presented 
by Schumann and Bradley [1, 2, 3] to compare the sensitivities of ex- 
periments. Bross [4] has given additional applications of the table and 
has suggested that it might be used to test certain genetic hypotheses. 
This note indicates how the table may be used to test a hypothesis of 
equality for two independent estimates of the heritability of a given 
trait (17): kh} = h3), when the estimates result from similar experiments. 
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The test criterion compared against the appropriate critical value 
wo of the Schumann-Bradley table is 


w= F,/F,, (1) 


where F; is the ratio of the mean squares of the 7th experiment (¢ = 1, 2). 
When the variance component model is used in the estimation of herita- 
bility, the estimate is usually a function of the intraclass correlation 
among full or half sibs. Assuming a completely random design and the 
random effects model, i.e. that the 7th experiment consists of a random 
sample of r; genetic groups and a random sample of k, individuals 
within each genetic group, then the expectations of the two mean 
squares in the analysis of variance are 


E(Among Groups M.S.) = + 
and 


E(Within Groups M.S.) = 0%; 


The relationship between heritability and the variance ratio is 


Wi = — 4+ 1}, (2) 


where a is the coefficient of additive genetic relationship (Wright’s 
coefficient under the assumption of additive and autosomal inheritance) 
among progeny within the same parental group. 

Schumann and Bradley have presented the test, utilizing the test 
criterion w as the ratio of the F; as given in (1); for convenience, the 
test is translated below in terms of the h;. The test requires knowledge 
of the ‘‘among”’ degrees of freedom v,; and the within degrees of freedom 
v.; . An additional requirement is that the k,; be constant for each 
group, for the theory underlying the random effects model does not 
carry over to a one-way classification with unequal numbers [5, 6]. 

The statistic w based on the two independent estimates of 
heritability is 


_ 1+ ah(k, — 1) 1 — ah? 
1 — ahi 1 + ahz(k, — 1). (3) 


The Schumann-Bradley test [3] will give an exact test of 7) : hi = hj 
(with a = .05, .025, .01 or .005 when H, : hi > hz , or a = .10, .05, 
.02 or .01 when H, : h3) so long as v,, = and v,, = »,.. Fora 
good approximate test, the following conditions (based on reference 
[1], equations 25 through 28) must be met: 

1. 1, 


The table may be entered with 2a = } (v,, + v,2) and 2b = 3 (v,, + %2)- 
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BOOK REVIEWS 


J. G. SKEtiLAM, Editor 
Members are invited to suggest books for listing or review to the Editor 


R. H. OSBORNE AND F. V. DE GEORGE. Genetic Basis of 
1 Morphological Variation. Cambridge, Mass.: Harvard University 
Press, 1959. Pp. xxii + 204, Tables and Appendices. $5.00. 


M. N. Karn, Galton Laboratory, University College, London, England 


The sub-title of this volume is An evaluation and application of 
the twin study method; the study was conducted at the Columbia Pres- 
byterian Medical Center under the auspices of the Institute for the 
Study of Human Variation Columbia University. 

The subject is introduced by a Foreward from Dr. T. Dobzhansky, 
followed by a Preface, and Contents and the titles of all the 92 tables 
contained in the text, from which the reader can see that the main 
interest of this book is anthropometrical. 

There are five Parts, of unequal length, but the work reads as a 
continuous whole. Part I (1 chapter) is the Introduction; Part II 
(3 chapters) is an evaluation of the twin study method; Part II (4 
chapters) deals with the design of the study of normal morphological 
variation; Parts IV and V (6 chapters) contains the analysis of mor- 
phological variation with Summary and Conclusions. 

The appendices include the form of schedule filled in for the twin 
data, with personal history, family history, physical examination sheet, 
abnormal and important findings, laboratory reports and tests, obser- 
vations on hair, eyes, teeth etc., and 54 body and head measurements. 

It may be noticed that there is only slight reference to mental 
capacity or emotional characteristics in this schedule. The investi- 
gation is essentially one of physical characters. 

There are further appendices referring to the data, followed by 
References and Index; and finally a set of outline drawings of the 
masculine and feminine figures in series showing the characteristics 
of each sex in the shape of the various measurements taken. 

Parts IV and V, covering about half of the book, are the important 
chapters from the biometrical standpoint, with the statistical analysis 
of the data and the tables and the interesting results flowing from them. 

Part I is a historical introduction to the genetic study of morpho- 
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logical variation from Mendel, Galton, Darwin, Pearson, l’isher, Mather, 
Penrose and others to the present decade where the situation with 
regard to evolution, Mendelism and variation is summed up by Dob- 
zhansky (1955) in the words “Ivolution is a gradual and continual 
process, but it results from the summation of many discontinuous 
changes, mutations, a great majority of which are small.” 

‘The development of the idea of the relationship between qualitative 
and quantitative variability is shown in the example of the well-known 
taste-testing of PTC which was originally treated as qualitative, but 
is now recognised as quantitative based upon a continuously graduated 
solution. The authors give also Lush’s apt illustration of the spectrum: 
“the colours of the spectrum when viewed by the human eye are placed 
upon a qualitative scale, but if measured in terms of the length of 
light waves, the scale becomes quantitative.” 

Thus the former separation of variation into two types, continuous 
aud discontinuous, is no longer necessary in the light of modern genetics; 
a term such as polygenic or multifactorial is to be preferred to quali- 
tative or quantitative. 

Part II introduces the idea of the twin study method, in its historical 
setting, recognised by Gaiton (1875) as a possible way of evaluating 
the relative roles of nature and nurture. With the well-known designa- 
tions of twins as MZ or DZ, and as concordant if the two of a pair 
are alike with regard to some condition or disease, Holzinger’s formula 
for the heritability of the disease can be calculated as a percentage 
which gives a qualitative value; but this formula takes no account 
of quantitative within-pair differences or heterogeneity within the 
monozygotic or dizygotic groups. 

The case of Mongolian idiocy is quoted as showing that formulae 
of the above type cannot discriminate between the environmental 
and hereditary factors; the heritability estimate is given by Neel 
and Schull (1954) as 0.881, one of the highest heritability values for 
a congenital defect. Maternal age and health have, however, been 
established as important environmental factors in both Mongolism 
and dizygotic twinning (Penrose). 

The next chapter on obtaining a twin sample, and the diagnosis 
of zygoticity (this form of adjectival noun is better than zygosity) 
emphasises the difficulty of avoiding a bias in the data. There is a 
high frequency of congenital abnormalities and other diseases and a 
high infant mortality which means that any data of adult twins is 
a highly selected sample of the number of twins born. 

The diagnosis of zygoticity has been one of much research; the 
best method up to date seems to be that of Smith and Penrose (1955) 
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which is based on blood groups, finger ridge counts and palmar angle. 

Part III gives a description of the present data. Many twin studies 
have been based on data of school children with the inevitable large 
standard deviations on account of age. The present data contain only 
adult twins, and consisted originally of 340 pairs obtained through 
the cooperation of patients of the Vanderbilt Clinic of the Columbia 
Presbyterian Medical Center in New York. They were selected as 
being over 18 years of age, and on condition that they were both living 
and available for observation. A sample of only 131 pairs remained 
as the basis of this study, and even this number was further reduced 
as 16 were found to have impaired health, and 3 others to be under 
age; so the final number was 112, compiled as 


MZ 25 34 — 59 
DZ 10 27 16 53 


Compared with previous series the total is small; other studies have 
included 1400, 900 and several sets over 200. 

An interesting table of the comparative frequencies of the sexes, 
and of zygoticity shows that a twin sample is seldom random as regards 
sex, the females always being in excess. 

Some vital statistics of the single births in the hospital show the 
frequencies as well as the twin incidence of abortions. 

With regard to the method used for the reduction of the data, 
an analysis of variance is made. 

The use of measurement error and interpair variances is a new 
device to help in the interpretation of twin data by the separation 
of the genetical and environmental components of variation. 

The measurement error (ME) was obtained by duplicating some 
of the measurements taken; 7 single subjects out of the monozygotic 
and 21 out of the dizygotic pairs were measured twice. 

Intrapair variances consist of measurement errors and the results 
of environmental and genetic influences. 

The data used embrace stature, weight, and a complete set of body 
and head measurements. 

The method of reduction is uniform for all these measurements 
which makes possible a rapid interpretation of the results with a glance 
at any one of the 60 or more tables (a), (b). 

The form of tables (a) gives mean variances, F ratio and P for 
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measurement error (A//) monozygotic, dizygotic, and interpair (/P), 
for males and for females. lor / ratio the results recorded are MZ : ME, 
DZ :MZ,1IP : DZ. 

‘Tables (b) give the same variances arranged to make a sex com- 
parison for 17Z, DZ and introducing the variance for unlike sev ~» 
of DZ. 

The marginal description in the tables seems to be at fault in y.. ag 
ME : MZ as the F ratio instead of MZ : ME; similarly the ratios of 
the other designations have been inverted with the exception of two 
cases in tables (b) which are correct. 

Correlation tables are also given for many of the pairs of measure-- 
ments for the twin groups. 

A summary is given in a comprehensive table with particular mea- 
surements showing both genetic and environmental influences, genetic 
only, or environmenta! only, sex influence and sex difference. 

Off 55 anthropometric measurements the following numbers show: 


| males females 
genetic influence only 11 11 
environmental influence only 2% 13 
both influences 29 


The females are on the whole more responsive to both influences 
which is a principle of human data noted by earlier investigators. 

A new departure for twin measurements is that of fat, bone and 
muscle, and assessing of body build. 

Tables in the standard form as previously given show mean variances 
and sex comparison. 

An interesting result from this is that upper arm diameter provides 
an extremely satisfactory indication of a genetic component of vari- 
ability which 1s a measurement of muscle mass; just as wrist breadth 
does for genetic variability in bone size. 

A chapter on body build follows, with tables again in the pattern 
used throughout, and some investigation of masculine and feminine 
characteristics in the various measured parts of the body. 

A set of references and a short index completes the study, altogether 
a very extensive investigation, with some definite contributions to 
the parts played by heredity and environment in morphological varia- 
tion; a book as perfect in production as could be desired, and most 
readable. 
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B. BENJAMIN. Elements of Vital Statistics. Chicago, Illinois: 
2 Quadrangle Books Ine. $10.00; London: Allen and Unwin Ltd., 
dbs., 1959. Pp. 352. 


W. J. Martin, London School of Hygiene and Tropical Medicine, 
London, England 


The last revision of Newsholme’s classic Elements of Vital Statistics, 
which has retained its appeal to workers in the fields of public health 
and vital statistics for over 30 years, appeared in 1923. The new 
developments and approaches to the subject since then and the in- 
creased interest in morbidity as against mortality made it almost 
impossible to produce a further revision in the same form, and Dr. 
Benjamin wisely decided to re-write the book. The new version is 
intended to appeal to the same circle of readers; the descriptive style 
of Newsholme has been followed, no knowledge of statistics or mathe- 
matics is required and no attempt has been made to introduce metho- 
dology. The extensive list of references at the end of each chapter 
will be invaluable to the general reader who wishes to extend his studies 
on any aspect of vital statistics. 

The book may be divided into two parts: indices published regularly 
in the reports of the Registrar General of England and Wales, and 
morbidity in general. The first seven chapters deal with population, 
registration of deaths, births and marriages, and with mortality rates 
and causes. A short history, which precedes the chapters on the census, 
the registration of births, marriages and deaths, the notification of 
infectious diseases and the advances made in these subjects, gives 
added interest. The discussions on mortality indices and the con- 
struction of life tables are good but, perhaps, a little too condensed 
for anyone coming fresh to the subject. 

The next ten chapters deal with morbidity in general, including 
maternal and child welfare, health of the school child, industrial and 
general incapacity, and the recent developments in the field of vital 
statistics—the systems for registration of cancer and mental disorders, 
hospital statistics, studies in general practice and field surveys. Chapter 
8, Measurement of Morbidity, which introduces the sequence of studies 
of disease, could have been expanded considerably with profit to the 
reader. An excellent chapter on tuberculosis reviews the tuberculosis 
services, the problem of control, follow up, the defects of notifications 
and the difficulties of comparing notifications in different localities. 

The methods used in England and Wales for the registration, col- 
lection and presentation of some indices are contrasted with those in 
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America. The validity of international comparisons and the recom- 
mendations of international bodies on classification are considered. 

The author has been successful in his attempt to condense all the 
relevant facts on vital statistics. The later chapters dealing with 
modern developments, of which the author has considerable knowledge 
and experience, are of particular interest as these topics have not 
been brought together before. All who have an interest in vital statis- 
tics either from the medical or social aspect will find this book, based 
on a mass of information collected from many sources, a very useful 
work of reference and a comprehensive guide. 
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ABSTRACTS 


The following are abstracts of papers presented at the sixth Biometric 
Colloquy of the German Region at Leipzig, January 23-25, 1959. 


W. U. BEHRENS (Hannover). Analysis of Multi-factorial Ex- 


periments. 


Two-factor experiments with a-b levels can be analysed according 
to one of two models (cf. Eisenhart): Model I pertains to a population 
of second- and higher-order interactions with the scope of extracting 
treatment and variety effects; Model II presumes a population of 
sampled experimental units, e.g. characters of an animal species sampled 
in different herds at different times, with the scope of estimating a 
general mean and variances. The models differ with regard to math- 
ematical concept, mean-square expectations, and interpretation of 
analysis of variance. With Model II the calculation of variance com- 
ponents does not meet with difficulties, though only large values of a 
and b yield precise estimates. The following expositions are based on 
Model I.—Here follows a demonstration (two-factor experiments with 
or without replications) of which mean squares are to be used in tests 
for significant differences in levels of factor A when considering (1) 
fixed levels (e.g. preassigned locations) of factor B, (2) the overall sum 
of levels of factor B, and (3) unfixed levels (e.g. any chosen locations) 
of B—Extension to three-factor experiments. 


H. RUNDFELDT (Hannover). Notes on the Evaluation of Non- 


veal orthogonal Experiments on an Electronic Computer. 


The extensive calculations in the evaluation of non-orthogonal 
experiments necessitate the use of electronic computing equipment in 
many circumstances. Either of two procedures, namely (1) the evalua- 
tion of main effects and variances from a system of equations based on 
experimental values or (2) estimation of missing values and orthogonal 
analysis of the completed table, yields the same results. 


B. SCHNEIDER (Karlsruhe). Covariance Adjustments by 
Orthogonal Polynomials. 


The effective removal of field trends by covariance methods fre- 
quently compels the use of higher than first-order parabolas. A method 
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is given and demonstrated for the reduction of calculational labor by 
using Lorenz polynomials in setting up stepwise the orthogonal poly- 
nomiils for fitting the parabola. 


655 W.SEYIFERT (Berlin). The Structure of Polyploid Populations. 


The composition with respect to one or two loci of tetrasomic 
populations is analysed under the assumptions of panmixia, infinite 
population size, no selection nor mutation, and for unspecified double 
reduction values. The composition of equilibrium populations and the 
rate of approach to equilibrium are derived and also the rate of approach 
to homozygosity in selfing di- and tetrasomic populations without and 
under selection. Discussion of the impact of population genetics on 
plant breeding is included. 


Hi. GUTZ (Berlin). The Use of Yeasts as Model Organisms in 


- Studies on Population Genetics. 


With respect to bacteria and higher organisms yeasts are advan- 
iangeous in population genetics in that they have a rapid asexual 
multiplication and can be induced to propagate sexually as well. A 
simple technique provides for isolation of lots of single ascospores which 
can be brought to copulation. This paper is a demonstaation of some 
experiments in population genetics of yeasts. 


II. PERNITZSCH (Weimar). Estimation of Heritability in 


saad Animal Breeding. 


Discussions of scope, use, and method of estimation of heritability 
in animal breeding are given. 


R. WARTMANN (Diisseldorf). Several Solutions of Some 


- Problems in Correlation Analysis. 


Discussion and solution of some problems in two-dimensional linear 
correlation, when the usual regression method does not apply since its 
prerequisites are not met, e.g. the identification of a structural relation- 
ship when the structural variates are masked by errors are given. 


W. LUDWIG and V. BANERJEE (Heidelberg). On the Theory 


- and Empirical Verification of Pearson’s Coefficient of Variation. 


The coefticient of variation V is widely used in biology and other 
disciplines. In unpublished work with Wartmann it has been shown 
that without further assumptions the postulated constancy of V for 
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body measurements etc. cannot be expected. If on the other hand the 
growth of the body and its parts of individuals of different mean size 
belonging to different related species is assumed to follow a Feller- 
Arley process, an asymptotic constancy of V follows. Length and 
weight measurements of body parts (trunk and disarticulated legs) 
taken from two homogeneous populations of water-runners showed 
essentially a constant V for lengths and weights, the latter ranging 
through about 1: 2,000. 


660 O. LUDWIG (Bad Nauheim). An Application of the Theory of 
Extreme values in Experimental Medicine. 


Results from the stochastic theory of extreme values for small 
samples, not necessarily assuming normal distributions, were used to 
compare mean values of times required for the dissolution of the first 
of some pills of dye-stuff in acids (human gastric juice). General 
inequalities for order statistics and further applications are mentioned. 
Details will be published in Biometrische Zeitschrift. 


661 G. A. LIENERT (Marburg). The Use of Factor Analysis in 
Pathogenical Studies. 


An evaluation by factor-analysis of the numerous measurable 
macro-anatomical and histological changes in organs and tissues of 
autoptic observations, grouped according to relatively homogeneous 
disease-complexes, is advocated. The factors extracted should be 
interpretable as independent formal causes of the changes observed. 
The results of such analyses should lead to a nosological recategorisation 
on & quantitative basis or to a revision of the traditional systematization 
in pathology. 


662 R. K. BAUER (Krefeld). On the Reduction of the Number of 
Characters in Multivariate Analysis. 


Discussion and review of rationale and method in multivariate 
analysis and discrimination analysis for the choice and reduction of the 
ensemble of characters relevant for the problem under consideration 
is presented. 


663 MALY (Prag). Sequential Estimation of the Parameter of 
the Binomial Distribution. 


In many papers three-sided sequential procedures for normal, 
Iehaneetel ate other distributions are given. It is possible to construct 
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the sequential test of acceptance for one of k exclusive hypotheses 
concerning the parameter p of the binomial distribution by carrying 
out une & indey-endent sequential two-decision tesis. k ©, we 
have a sequential procedure for er estimation of p; if p, is the trve 
value of the parameter of the bionomial distribution and 6 a given 
constant, segvertial procedure nes the properiy thet the prob- 
ability of acceptance of p outside the interval (po — 6, po +- 6) can be 
chosen arbitrarily. 


664 H. RUNDFELDT (Hannover). On an Improved Method for the 
Evaluation of Single Trials. 


It is shown that the removal of field trends under the field plot 
trial and latin square hypotheses is not optimal with respect to the 
estimation of mean values. It is suggested to use only the central 
plot value as an estimate for block differences and to estimate the 
remaiung vanes by interpolating with a cubic parabola. The 
liaprovement is sopposed to amount to about 5-10%. 


JaNoOSCHI (Giessen). The Kinetics of Biological Reactions 
Dependent on Several Factors. 


Most biological reactions are dependent on several simultaneously 
acting additive and/or multiplicative factors. Effective factors can 
be isolated statistically and a factorisation of reaction kinetics is possible. 


U. HACKENBERG (Brackwede) and B. SCHNEIDER (Karls- 
666 ruhe). Rationalisation of Pharmacological Studies by a Special 
Numerical System (WL 24-system). 


The usefulness of the previously published WL-24-system of gev- 
progressiox numbers is demonstrated and interpreted as 
partially duc to tke fact that these numbers constitute a group. 


7 >. SCHNEDIDiLN (arlsruhe). On the Biomathematics of 
Excitation. 


Review is given on the deterministic models of nervous excitation 
of Nernst, Hill, Blair, and Rashevsky and the more recent stochastic 
approaches of v. Schelling, Rapoport, Shimbel, Solomonoff, et al. In 
these models the spreading of nervous excitation can be interpreted as 
special branching processes (random networks), yielding a class of 
distribution laws governing the excitation. Knowledge of these dis- 
tributions permits statements about the course of excitation, and vice 
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versa. If the mathematical model is adapted to the procedure of 
measuring excitation, promising results follow; this is demonstrated 
for e.e.g.-curves. 


Hl. L. LEROY (Ziirich). Gene Effects and Genotypic Correlation 
between Individuals in a Random Matirg Pepulation. 


Review of Fisher’s, Malécot’s and Kempthorne’s work on population 
genetics of random mating and interallelic gene interactions (epistasis) 
is the basis of this paper. 


Il. L. LEROY (Ziirich). Variance and Regression Analysis as 


“ a Means for Characterizing Genetic and Environmental Effects. 


Formal demonstration of how the components of variance in a 
breeding experiment can be obtained from mean squares by use of 
Kempthorne’s formulae and of how to interpret them in terms of 
genetic and environmental effects is given. 


670 L. OSADNIIs (Leipzig). The Use of Cohort Analysis ii: Biometry 


I:xamples of Whelpton’s cohort analysis of fertility measurements 
are shown. 


F. BURKHARDT (Leipzig). Statics and Dynamics of Biometrical 


Developments. 


The description of biometrical developments by differential equations 
meets with two parameters: integration constant and factor of pro- 
portionality. Splitting up the differential equation according to an 
alternative character gives a system of two differential equations, 
eliminating time gives a static relation, splitting according to a second 
alternative and estimating both parameters by least-squares resaics 
in an inequality which gives insight into the biometry of the process 
under consideration. The above system of two differential equations 
earn be described by a correlation. Splitting according to a second 
alternative yields a system of two correlations leading to the same 
inequalities obtained formerly. 


M. J. R. HEALY (Harpenden, Herts.). Analysis of Factorial 


“ Experiments on an Electric Computer. 


The N.R.D.C.-Elliott computer at) Rothamsted experimental 
Station is used extensively for the analysis of field experiments, many 
of which are factorial in design. Apart from general purposes pro- 
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grammes for randomized blocks, Latin squares, etc. we have at present 
three programmes designed specifically to analyse factorial experiments. 
‘The sirst ot tnese is a very general programme for experiments with 
up to 6 factors, each with up to 8 levels, without confounding. Any 
required multi-way table of means can be printed out, and any required 
main cifects or interactions can be segregated in the analysis of varinave. 
‘This programme is based on Scan and Call-up routines which are of 
general interest. The next programme analyses 2” experiments up to 
a maximum of 128 plots. The results provided include the table of 
main effects and interactions, any required multi-way tables and 
appropriate standard errors. The programme will cope with confound- 
ing, whether total or partial, and fractional replication, and can also 
analyse 8 X 8 quasi-Latin squares. The third programme analyses 
3° experiments in single replication. This is a very common layout for 
fertilizer trials. Special arrangements are provided for isolating single 
degrees of freedom and for providing error estimates from components 
of interaction. Certain gencrai facilities are incorporated in these 
programmes. These include preliminary computations on the iaw 
data, Jievanees for inissing piots and covariance analysis. 


H. A. HACKNESELLNER (Berlin). Statistical Studies on the 
673 Frequency of Malformations in its Relation to Wars and Other 
Times of Want, for 1901 through 1955 Vienna. 


Evaluation of the Jahrb. d. Statist. Amtes d. Stadt Wien reveals 
peaks in the frequency of congenital malformation following the first 
world war, the depression of the late twenties, and during and following 
the second world war with an absolute maximum in 1952. 


The follow'ng is an abstract of a paper presented to the French Region 
in Paris on June 12, 1959. 


SULLY (Institut National d’ttudes de 
674 Mographiques, Paris). Factor Analysis Applied to the Study of 
Life Tables. 


Submitting life tables to a factor analysis may help in answering 
two questions: (a) To what extent can a single index synthetize the 
mortality of a given population? (b) What is the minimum number of 
indices necessary for describing such mortality? 

Referring to the psycho-technician’s method, the materials of this 
factor analysis are: (1) tests: age groups 0, 1-4, 5-9, --- , 80-84, and 
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the expectation of life at birth e. (or more precisely, its complement 
100 —e,); (2) analyzed individuals: 158 life tables; (3) scores: mortality 
rates g, . 

The best way of expressing a life table by a single index is given by 
the first principal component, index k. lor the 158 lite tables under 
swuidy, io the optimum case, only 82 percent of the g, variau.ce is ex- 
pressed. The expectation of life at birth eo is very close to the index EF. 
This observation holds too for the mortality of 35-44 year-old women. 

The first component gives the best average description of an hetero- 
geneous group. When taking only into account the 3 first principal 
components, rotations make appear 3 relatively homogeneous sub- 
groups to which could be applied proper indices. 

Numerical values of each of the orthogonal indices ¢, ¢2 ¢3 corre- 
sponding to the 3 sub-groups are normally distributed about a mean 
100 with a standard deviation of 25 for the 158 life tables: index ¢, 
looks like a background indicator. It expresses 3/4 of the total variance 
of the g, ; index ¢, characteristizes a part of the mortality above 40 
years: 26 percent of the variance of male rates and 13 percent of that 
of feraale rates; index ¢; characterizes a part of the mortality at very 
old ages: 61 percent of the variance of 80-84 age group rates. 

Reconstitution of the 158 life tables: indices ¢, ¢2 ¢3 do not give an 
accurate description of the infant mortality, nor of the sex differentials 
in mortality above age 35. To fill these gaps, more than 3 factors 
should be taken into account. 


cf. Population 1959, No. 4, pp. 637-682. 


i 
| 
‘ 
if 
a 
| 
‘ 
ke 


THE BIOMETRIC SOCIETY 


Belgian Region 


La Société Adolphe Quetelet, Région de la Biometric Society pour 
la Belgique et le Congo Belge, a tenu son Assemblée Générae statutaire 
le 23 mars 1960 & la Faculté de Médecine de |’ Université de Bruxelles. 

Au cours de cette réunion, Mr. J. HENRY, Président de la Société, 
a fait historique de la fondation de notre nouvelle revue trimestrielle 
intitulée BIOMETRIE—PRAXIMETRIE et dont le premier numéro 
est sorti de presses le 15 mars 1960. 

Le Président a également annoncé le programme de la Troisiéme 
Journée Biométrique que la Société Quetelet organisera & Gembloux le 
26 juillet 1960, dans le cadre des Fétes du Centenaire de |’Institut 
Agronomique de |’Etat. 

L’Assemblée a d’autre part admis au sein de la Société 46 nou 
veaux meinbres titulaires. 

Le Dr. L. MARTIN, actuellement Président de la Biometric Society, 
a souhaité étre dégagé de sa charge de Secrétaire de la Société Quetelet 
pour une période de 2 ans. Melle A. LENGER, secrétaire adjointe, 
a été désignée secrétaire a.i. pour cette période. 

A sa demande, le Professeur P. DE NAYER, Vice-Président de la 
Société, a été remplacé par Mr. LAUDELOUT de l'Université de 
Louvain. 

La réunion s’est terminée par un exposé de Monsieur A. ROTTI, 
Trésorier dela Société Quetelet, sur l’anulyse de données non orthog- 
onales dans le cas d’une expérience & deux facteurs. 


PROGRAMME DE LA TROISIEFML JOURNEF BIOMETRIQUE 
GEMBLOUX, 26 JUILLET 1960 


Organisée par la Société Adolphe Quetelet avec la collaboration du 
Bureau de Biométrie (I.R.S.1.A.) dans le cadre des l’étes du Centenaire 
de l'Institut Agronomique de |’Etat. 


volution et développements récents des sciences biométriques 
dans le cadre des recherches agronomiques. 


MATIN 


—Allocution de bienvenue. 
J. M. HENRY, Président de la Société Quetelet. 
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9h.45—The use of electronic computers in the analysis of sets 
of experimental fields. 

F. YATES, Head, Statistical Department, Rothamsted 
Experimental Station, Harpenden, England. 

’ 10h.30—Application des méthodes biométriques et statistiques en 

écologie. 

Ph. BOURDEAU, Doyen de la Faculté d’Agronomie de 

Université Officielle du Congo Belge et du Ruanda- 

: Urundi, Astrida, Ruanda-Urundi. 

llh. —Pause 


11h.20—Problémes de l’hérédité quantitative. 
ee: Dr. H. L. LE ROY, Institut fiir Tierzucht, Eidgenéssische 


Technische Hochschule, Ziirich, Suisse. 
11h.50—Die Geschlechtsverteilung bei Cucurbita-Hybriden undihre 
Beziehung zu einer Ansteckungsverteilung von Neyman. 
Prof. Dr. F. WEILING, Institut fiir Landw. Botanik, 
Bonn, Allemagne. 


APRES—MIDI 


14h.30—La puissance des tests dans l’analyse de la variance des 
plans en blocs au hasard—nomogrammes pour le choix du 
nombre de répétitions. 
Dr. M. KEULS, Landbouwhogeschool (Agricultural 
I University), Wageningen, Hollande. 
15h. —Développement des méthodes biométriques et statistiques 
dans la recherche agronomique au Congo Belge et au 
Ruanda-Urundi. 
J. M. HENRY, Conseiller & V’I.N.E.A.C.”, Ancien 
Directéur Général de l’I.N.E.A.C. en Afrique. 
15h.45—Développement des méthodes biométriques et statistiques 
dans la recherche agronomique en Belgique. 


7 ig Dr. L. MARTIN, Chargé de Cours & l'Institut Agro- 
vt nomique de |’Etat & Gembloux, Directeur.du Bureau de 


Biométrie (I.R.S.I.A.). 


16h.30—Pause. 


16h.50—Les plans d’expérimentation dans la sélection des variétés 
de betterave sucriére. 
M. SIMON, Directeur de |’Institut Belge pour |’Amélio- 
ration de la Betterave. 


Institut National pour l’Etude Agronomique du Congo Belge (I. N. E. A. C.).* 
2Institut pour l’encouragement de la Recherche Scientifique dans |'Industrie et l’Agriculture 
(I. R. 8. A.). 
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17h.20—Les méthodes d’échantillonnagé en sylviculture. 
Anne LENGER, Ingénieur Agronome (section Eaux et 
J’oréts), Chef de Travaux au Bureau de Biométrie 
(I.R.S.1.A.). 


INSCRIPTIONS 


Les inscriptions & la Troisitme Journée Biométrique sont gratuites; 
elles peuvent étre adressées dés maintenant au Secrétariat de la Société 
Adolphe Quetelet, 7 rue Héger-Bordet, Bruxelles 1, Belgique. 


Deutsche Region 


The 6th Biometric Colloquy of the German Region of this Society 
was held at Leipzig from 23rd to 25th January 1959. The main topics 
were population structure, dynamics and genetics, and analysis of 
multi-factorial experiments, besides which papers on general and 
special subjects from biometry and biomathematics were read, total- 
ling 25. The conference was attended by 156 errolled participants, 
including Li guests from abroad; it was overshadowed by the sudden 
death ct Wilhelm Ludwig, the region’s secretary-treasurer, on the 
first day of the colloquy (ef. obituary in Biometrics 15, 360.). 


ENAR 
MEETINGS OF E.N.A.R. 


At the invitation of the Western North American Region, E.N.A.R. 
will meet jointly with them and with the American Statistical Asso- 
ciation at Stanford University, Palo Alto, California, on August 23-26, 
1960. Titles and abstracts, the latter in duplicate in the form published 
in Biometrics, of contributed papers for 1).N.A.R. should be sent to 
Professor Oscar Kempthorne, Department of Statistics, lowa State 
College, Ames, Iowa. 

W.N.A.Jt. wiil also meet jointly with the American Institute of 
Biological Sciences at. Oklahoma State University in Stillwater, Okla- 
homa, on August 28-September 2, 1960. 

Programs for both these meetings wili be anuounced later. 


TWELFTH ANNUAL MEETING OF THE BIOMETRIC 
SOCIETY (ENAR) WASHINGTON, D. C., 
DECEMBER 27-30, 1959 

Program 
DESIGN OF EXPERIMENTS—I. 

Chairman: Oscar Kempthorne—C. P. Cox: The relationship between 
covariance and individual curvature analyses of experiments with back- 
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ground trends. D. S. Robson and G. F. Atkinson: Testing homogeneity 
of regression coefficients in a one-way analysis of variance. M. E. 
Turner: The need for experimental design research for nonlinear trends 
in medical biology. 


MULTIVARIATE ANALYSIS. -I. 


Chairman: S. S. Wilks—A. W. Marshall: Multivariate Chebyshev 
type inequalities. J. Olkin: Some applications of multivariate Cheby- 
shev inequalities. R. A. Wijsman: A representation of the Wishart 
matrix. 


CONTRIBUTED PAPERS—1. 


Chairman: Edmund A. Gehan—N. R. Bohidar: Role of sex-linked 
genes in quantitative inheritance and related topics. G. Stanley Wood- 
son: Hospital blood requirements. IW. 7’. Stille: Protection and infection 
rates in (influenza and adenovirus) vaccine evaluations. 


DESIGN OF EXPERIMENTS—II. 


Chairman: S. L. Andersen--W. T. Federer: Rectangular lattices for 
v = pk” in incomplete blocks of size k. G. &. P. bua: Designs of type B’. 
John Mandel: The statistical evaluation of test methods. Discussant: 
H. O. Hartley 


MULTIVARIATE ANALYSIS—II. 


Chairman: R. L. Anderson—J. E. Jackson and R. A. Bradley: 
Sequential chi-square and 7” tests. R. F. Tate and I. Olkin: Multi- 
variate correlation models with mixed discrete and continuous vari- 
ables. S. N. Roy: Analysis of multi-factor multi-response experiments 
with mixed factor and response types. 


STATISTICAL METHODS IN HUMAN GENETICS. 


Chairman: Seymour Geisser-—Richard Osborne und Oscar Kemp- 
thorne: Methodology of twin studies. Donald Richter and Seymour 
Geisser: A statistical model for diagnosing zygosis by ridge-count. 
Gordon Allen: A differential method of estimating type frequencies in 
triplets and quadruplets. 


CLASSIFICATION AND DISCRIMINATION—I. 


Chairman: Herbert Solomon—T. W. Anderson and R. R. Bahadur: 
Classification into multivariate normal distributions with unequal co- 
variance matrices. A. Bowker: A representation for Anderson’s W 
statistic. R. Sitgreaves: An asymptotic expansion for the distribution 
function of Anderson’s classification statistic W. 
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CLASSIFICATION AND DISCRIMINATION —II. 


Chairman: S. W. Greenhouse—W. G. Cochran and C. E. Hopkins: 
Some problems in multivariate classification using discrete variates. 
M. J. R. Healy: The descriptive use of discriminant functions. P. J. 
Hoffman: Linear multivariate models as representative of clinical judg- 
ment. 


RATIO ESTIMATION. 


Chairman: H. O. Hartley—H. O. Hartley: The application of ratio 
and regression estimators to analytic studies. J. N. De Pascual: Ratio 
estimation in stratified sampling. D. S. Robson and C. Vithayasal: 
Unbiased component-wise ratio estimation. A. Ross: Sampling experi- 
ments on ratio estimators. W.H. Williams: Unbiased estimation. 


STATISTICAL DEFINITIONS, CONCEPTS, 
AND CATEGORIES. 


Chairman: Dudley Kirk—Frederick F. Stephan: Relations of some 
social science concepts to statistical data. L. E. Rowebottom and D. H. 
Steinthorson: Some principles of definition in statistics. Morris H. 
Hansen, Leon Pritzker and Joseph Steinberg: The evaluation program 
of the 1960 censuses. 


MATHEMATICAL MODELS IN THE LIFE SCIENCES. 


Chairman: Harold F. Bright—Jerzy Neyman and Elizabeth L. Scott: 
The two-stage mutation model for carcinogenesis and experimental 
means of its verification. Wilfred Rall: On the relation between the 
structural geometry and the function of individual neurons. Chin 
Long Chiang: Competing risks and the follow-up study. 


PLANNING CLINICAL TRIALS. 


Chairman: Jerome Cornfield—Panel: Irwin D. J. Bross, Spencer M. 
Free, Jr., Harry Gold, Edmund A. Gehan and John Moyer. 


CONTRIBUTED PAPERS—II. 


Chairman: Marvin A. Kastenbaum—Ira A. De Armon, Jr: Estimating 
shelf life of bacterial populations. Robert J. Buehler: Application of 
linear regression to frequency graduation. Clifford J. Maloney: Disease 
severity quantitation. Robert G. Hoffman Some experiences in the 
practice of medical statistics. W. A. Glenn and H. A. David: Ties 
in paired comparison experiments. Rolf E. Bargmann: Some generalized 
distributions and confidence intervals in multivariate analyses. K. M. 
Patwary: Error and non-error models in bio-assay. (by title) 
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Members Present 


The Twelfth Annual Meeting of the Biometric Society (NAR) 
was held jointly with the American Statistical Association in Washing- 
ton, D. C., December 27-30, 1959. The following people attended: 
Ambelang, R. Anderson, T. Anderson, V. Anderson, Anscombe, Bailar, 
Bailey, Bain, Baird, Bancroft, Beall, Bearden, Bechhofer, Bingham, 
Birnbaum, Bliss, Bohidar, Box, Boyd, Bradley, Brandt, Brenna, Bross, 
M. Bruyere, P. Bruyere, Burr, Busch, Cameron, Carroll, M. Carter, 
R. Carter, Charter, Chassan, Chew, Chiang, Chu, Ciminera, Clatworthy, 
Cochran, Connor, Coons, Cornell, Cornfield, Coulter, Covert, C. Cox, 
E. Cox, G. Cox, Dalton, Daniel, Darroch, David, Davis, Day, DeArmon, 
Degan, DeGray, Deming, Diamond, Donnelly, Drew, Dunbar, A. 
Duncan, D. Duncan, Dutton, Eisen, Eisenhart, Elkin, Elston, Federer, 
Federspiel, Feigl, Ferris, Fetters, Finkner, Foster, Fraser, Free, Fryer, 
Gani, Gardiner, Gart, Gates, Gehan, Glenn, Gnanadesikan, M. Gordon, 
Gosslee, Greenberg, Greenhouse, Gumbel, Gurian, Gurland, Hafley, 
Hagans, W. Hall, Hanson, Harris, Hershvarger, Hartley, Harvey, 
Hayne, Healy, Hemphill, Hermann, Herrera, Hoffman, Hogben, C. 
Hopkins, Horn, Horner, Hotchkiss, Hotelling, Houseman, Hunter, 
Ipsen, James, Josie, Kastenbaum, Kempthorne, Kennedy, Kimball, 
King, Klett, Kossack, Kramer, Kruskal, Kullback, Kupperman, Lamm, 
Lamphiear, Leary, LeClerg, Levene, Lewis, Lieberman, Lilienfeld, 
Linder, Lipstein, Littell, Lukacs, Lunger, Lyerly, Maas, Mainland, 
Maloney, Malzberg, Mendel, Manos, Marshall, F. Martin, M. Martin, 
Meier, F. Miller, Minton, Mode, R. Monroe, Moore, Moshman, Mos- 
teller, Muench, Munch. Nadler, Norris, Northam, H. Norton, J. Norton, 
Olds, Olmstead, Pauls, Perrin, Pincus, Pratt, Reid, Rigney, Ringle, 
Roberts, Robson, Rosania, Ross, Rulon, 8. Russell, T. Russell, Sachs 
Sagen, Sammons, Schultz, Scott, Shelly, Silber, Sitgreaves, Skory, 
Smart, H. Smith, J. Smith, R. Smith, Sowinski, St. Pierre, Stander, 
Starr, Stavropoulos, Stearman, Steel, Sternberg, Stier, Stille, Svejda, 
Tabler, R. Taylor, W. Taylor, E. Thomas, G. Thomas, Thompson, 
Torrey, Tukey, Turner, Vogel, Votaw, Wadley, Wallace, Walpole, 
Warren, Watson, I. Weiss, W. Weiss, Wells, Whidden, White, Whitney, 
Whitwell, Wilkinson, Wilks, G. Williams, W. Williams, Wine, Wolfe, 
Woo, Wood, M. Woodbury, Woodson, Youden, Zelen. 
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ANNUAL FINANCIAL STATEMENTS 
Tue Bromerric Socrety 1959 
BALANCE SHEET 
Assels 
Cash: Bank Balance $7,246.24 
Petty Cash 19.40 $7,265.64 
Liabilities 
Subscriptions, 1960 $ 137.00 
Dues, 1960 80.00 $ 217.00 
Surplus, January 1, 1959 5,233.70 
Gain for Period 1,814.94 7,048.64 
\ 
$7,265.64 
Audited: Herbert M. Beckler (Signed) 
Date: March 22, 1960 ( 
INCOME AND EXPENDITURE STATEMENT ‘ 
Income 
Subscriptions, 1958 $ 374.00 
Subscriptions, 1959 6,150.50 $ 6,524.50 a 
Dues, 1958 118.75 
Dues, 1959 2,435.75 2,554.50 
Sustaining memberships, 1959 641.00 
Back dues and subscriptions 57.75 
Regional allotments $ 109.25 
BIOMETRICS allotments from sustaining members 250.00 
Back issues 77.50 
Member subscriptions to Journal of ASA 70.00 
Payments for directories 3.00 
Overpayments 15.67 
$ 525.42 
Less: Credits and regional allotments used 118.60 406.82 


Total Income 


$10.184.57 
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Expenditures 
BIOMETRICS $7,174.49 
Postage 162.71 
Office supplies and stationery 93.23 
Secretary’s office 750.00 
Pharmacology symposium 24.50 
Member subscriptions to Journal of ASA 45.00 
P. O. Box rental 9.00 
Shipping charges 22.72 
Addressing and mailing services 87.98 
Total Expenditures $ 8,369.63 
Excess of Income Over Expenditures 1,814.94 
$10,184.57 
Audited: Herbert M. Beckler (Signed) 
Date: March 22, 1960 
BIOMETRICS VOLUME 15 
STATEMENT OF OPERATIONS 
For the Year Ending January 31, 1960 
Income 
Biometric Society 
948 at $4.00 $3,792.00 
943 at 2.75 2,593.25 
9 sustaining at 25.00 225.00 $ 6,610.25 
1077 direct at 7.00 7,539.00 
Sale of back issues 
Biometric Society 413.50 
Editor’s Office 3,851.68 4,265.18 
March 1959 issues at cost to 
Biometric Society 21.93 
Sale of Reprints 1,434.24 
Refund 49 
Overpayments 4.50 
Payments for Regions 18.50 
Total Income from Operations $19,894.09 
Expenditures 
Cost of Journals 
Printing 
Issue No. 1 $3,306.78 
Issue No. 2 3,158.31 
Issue No. 3 2,675.42 ' 


Issue No. 4 2,802.91 $11,943.42 
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Mailing and Express Charges 


Issue No. 1 $ 207.27 
Issue No. 240.25 
Issue No. 3 192.965 
Issue No. 4 


210.50 850.98 $12,794.40 


Gost of Reprints (See Comments) 


Printing 

Issue No. 4 (1958) $ 254.92 
Issue No. 1 393.33 
Issue No. 2 360.36 


Issue No. 3 321.75 $ 1,330.36 


Mailing Charges Reprints 


Issue No. 4 (1958) $19.48 
Issue No. 1 33.10 
Issue No. 2 29.49 
Issue No. 3 24.38 106.45 $ 1,436.81 
Mailing I. 8S. ¥. Prospectus $ 125.68 
Reprinting (500 copies each) 
June 1946 $ 72.05 
December 1946 82.60 
March 1947 131.14 
March 1956 205.87 491.66 
Operating Expenses 
Stamps (See Comments) $ 386.21 
Stationery, mailing envelopes 356.75 
Moving costs 100.00 
Addressographing 76.47 
Salary and F. I. C. A. 967.21 
Office Supplies 29.81 
Telcpkoue Calls 5.76 
Express Charges 78.80 
Refunds on subscriptions snd overpayments 52.00 
Post Office Deposit 15.00 
Loss on Exchange 4.38 
Transfers to Regions 18.50 2,090.89 
Total Expenditures From Operations $16,939.44 
Income From Operations (from Page 1) $19,894.09 
Expenditure From Operations 16,939.44 


Surplus From Operations $ 2,954.65 
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Non-operating Items 


Income 
U.S. National Science Foundation Grant 1,582.36 
Bank Interest 626.25 
Credit, William Byrd Press (See Comments) 162.00 
Mailing I. S. I. Prospectus 125.68 
Announcement 35.00 $2,531.29 
Expenditures 
Credits to 1958 Accounts Receivable 
(Cancellations) 29.00 
Surplus From Non-Operating Items 2,502.29 
Gross Surplus, 1-31-60 $ 5,456.94 
BIOMETRICS VOLUME 15 
BALANCE SHEET 
January 31, 1960 
Assets 
Accounts Receivable $ 1,727.85 
Bank Balances 
Savings No. 1 (Blacksburg) $10,091.99 
Savings No. 2 (Blacksburg) 9,354.74 
Savings No. 3 (Blacksburg) _ 3,075.45 
Checking (Tallahassee ) 8,148.89 
Checking (Blacksburg) 220.00 


Checking (Blacksburg ) {.46] 30,890.61 
(See Comments) 


Liabilities and Surplus 


Subscriptions to Vol. 16 $ 5,958.05 
Subscriptions to Vol. 17 105.00 
Subscriptions to Vol. 18 35.00 
Surplus from previous volumes 21,063.47 
Surplus from Volume 15 5,456 94 
Totals $32,618.46 $32,618.46 
Comments: 


(1) Not included in assets is stock of back issues from Volumes 1-15. 

(2) Not included in expenses is printing bill for December reprints of $314.70. 

(3) Cost of stamps used by the Assistant Managing Editor was estimated to be 
$80.00. All other expenses were incurred by the Editorial Office. 

(4) Checking Account in Blacksburg is in process of being closed out due to change 
of Editor’s Office to Tallahassee during the year. 

(5) Credit from William Byrd Press is for adjustment of December, 1958 reprints. 


Audited March 18, 1960 
Richard Q. Conrad = (Signed) 
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CHANGES IN MEMBERSHIP 
(January 15, 1960—April 15, 1960) 
Changes of Address 


Mr. Munir Ahmad, Institute of Statistics, Panjab University, Lahore, 
W. Pakistan. 

Dr. Leo A. Aroian, 8005 Altavan Avenue,:Los Angeles 45, California, 
USS.A. 

Mr. Leslie Norman Balaam, Department of Agriculture, University 
of Sydney, Sydney N.S.W., Australia. 

Dr. Glenn E. Bartsch, Department of Preventive Medicine, Western 
Reserve University, Cleveland 6, Ohio, U.S.A. 

Dr. G. E. P. Box, Mathematics Research Center, U. 8S. Army, Uni- 
versity of Wisconsin, Madison 6, Wisconsin, U.S.A. 

Dr. Martin A. Brumbaugh, Statistics Division, Bristol Laboratories, 
Syracuse 1, New York, U.S.A. 

Dr. Hugh Bishop Cannon, Statistical Research Service, Canada Depart- 
ment of Agriculture, Oitawa, Ontario, Canada. 

Dr. Victor Chew, Computation and Analysis Laboratory, U. S. Naval 
Weapons Laboratory, Dahlgren, Virginia, U.S.A. 

Mr. William P. Chu, 4160 Doney Street, Columbus 13, Ohio, U.S.A. 

Dr. Angelo Cresseri, Via Fogazzaro 27, Milano, Italy. ' 

Dr. Gian Maria Curto, Via Giulio Cesare 11, Bergamo, Italy. 

Dr. J. H. Davidson, 7 White Street, Schenectady, New York, U.S.A. 

M. Jean Dejardin, Institut d’Enseignement et de Recherches Tropi- 
cales, 80 route d’Aulnay, Bondy (Seine) France. 

Mr. Bruce A. Drew, The Pillsbury Company, P. O. Box 594, New 
Albany, Indiana, U.S.A. 

Mr. Magdi El-Kammash, Economics Department, Duke University, 
Durham, North Carolina, U.S.A. 

Dr. N. R. Fraser, Department of Entomology and Parasitology, P. O. 
MacDonald, Quebec, Canada. 

Dr. J. Gani, Department of Mathematics, University of Western — 
Australia, Nedlands, W. Australia. 

Dr. H. Geidel, Scheffeldfeld 21, Hannover-Bothfeld, Germany. 

Mr. Pierre Gilbert, 29 Ave Capitaine Piret, Brussels 15, Belgium. 

Mr. H. Stanley Graf, New Canaan Avenue, Rt. 2, Norwalk, Con- 
necticut, U.S.A. 

Dr. F. M. Hemphill, Statistics and Analysis Branch, National Insti- 
tutes of Health, Bethesda 14, Maryland, U.S.A. 

Dr. George W. Hervey, 1280 Brookfield Avenue, Apt. 3, Sunnyvale, 
California, U.S.A. 
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Dr. Henry J. Horn, P. O. Box 415, Herndon, Virginia, U.S.A. 

Mr. Donald K. Hotchkiss, Research Division, Ralston Purina Company, 
St. Louis, Missouri, U.S.A. 

Dr. Peter Ihm, 51-53 rue Belliard, Bruxelles, Belgium. 

Mr. J. Edward Jackson, 371 Edgemere Drive, Rochester 12, New York, 
U.S.A. 

Mr. S. H. Justesen, Jac. P. Thysselaan 65, Bennekom, Netherlands. 

Dr. John R. Kinzer, 829-Eleventh Street, Santa Monica, California, 
U.S.A. 

Dr. Leonard S. Kogan, 1333 E. 27th Street, Brooklyn 10, New York, 
U.S.A. 

Mrs. Katherine B. Ladd, 308 Lonsdale Road, Toronto, Ontario, Canada. 

Mr. Richard Polk Lehmann, Department of Animal Industry, N. C. 
State College, Raleigh, North Carolina, U.S.A. 

Dr. R. T. Leslie, National Standards Laboratory, University of Sydney, 
Sydney, N.S W., Australia. 

Dr. S. Lipton, University of N.S.W., Sydney, N.S.W., Australia. 

Dr. Dietrich Lorenz, Tropon-Werke, Koln-Mulheim, Germany. 

Mr. John G. Magistad, 2121 Hendola Drive, N.E., Albuquerque, New 
Mexico, U.S.A. 

Dr. Mario Morea, Via B. Gonzati 15, Padova, Italy. 

Mr. W. V. Neisius, 4121 Bluemound Road, Rolling Hills Estates, 
California, U.S.A. 

Dr. Mary Ellen Patno, 721 So. Forest, Apt. 407, Ann Arbor, Michigan, 
U.S.A. 

Mr. Kamini Mohan Patwary, Division of Communicable Disease, 
Palais des Nations, WHO, Geneva, Switzerland. 

Prof. Mahlon F. Peck, Department of Mathematics, Western Maryland 
College, Westminster, Maryland, U.S.A. 

Mr. D. R. Read, Cadbury Brothers Ltd., Bournville, Birmingham, 
England. 

Miss Edith Reid, 20 Woodcliff Avenue, North Bergen, New Jersey, 
U.S.A. 

M. Daniel Schwartz, Institut Gustave Roussy, 16 bis, Avenue Paul- 
Vaillant-Couturier, Villejuif (Seine) France. 

Dr. Hilary L. Seal, Osborn Zoological Laboratory, Yale University, 
New Haven, Connecticut, U.S.A. 

Dr. Robert R. Shrode, Wm. H. Miner Institute, Chazy, New York, 
U.S.A. 

Mr. Norman Willison Simmonds, Department of Potato Genetics, 
John Innes Horticultural Institution, Bayfordbury, Herts., England. 

Dr. Erich Soom, Steinhausen ZG, Switzerland. 


= 

4 

a 

3 


THE BIOMETRIC SOCIETY 329 


Dr. Francis L. Stanonis, 5447 Crawford Drive, Columbus 24, Ohio, 
U.S.A. 

Mrs. Hanna D. Sylwestrowicz, Washington Corner Road, Bernards- 
ville, New Jersey, U.S.A. 

Mr. Earl A. Thomas, 2322 Surrey Lane, Falls Church, Virginia, U.S.A. 

Professor Antonio Tizzano, Istituto di Igiene, Universita degli Studi, 
Napoli, Italy. 

Mr. Glen I’. Vogel, 5214-8th Road, S., Apt. 3, Arlington 4, Virginia, 
USS.A. 

Dr. U. Vogliazzo, Ospedale Mauriziano, Aosta, Italy. 

Mr. Francis R. Watson, 4099 Indianola, Columbus 14, Ohio, U.S.A. 

Mr. Vernon E. Weckwerth, 1220 Mayo, University of Minnesota, 
Minneapolis 14, Minnesota, U.S.A. 

Prof. Dr. Dietrich Wichmann, Wilhelmsplatz 7, Bonn, Germany. 

Miss Roberta A. Wilcox, 288 Ehrhardt Road, Pearl River, New York, 
US.A. 

Mr. C. B. Williams, Department of Zoology, University College, 
Aberystwyth, Wales. 

Dr. Evan J. Williams, Rox 71, G.P.O., Canberra, A.C.T., Australia. 


New Members 

At Large 

Mr. Sydney E. Cruise, Department of Mathematics, University of 
Natal, Durban, South Africa. 

Prof. Dr. Jan Czekanowski, Ulica Kanclerska 14, Poznan, Poland. 

Mr. Yuri P. Nekrutenko, Kiev 70, Andreyevski spusk 3-5, Ukraine, 
U.S.S.R. 


Australasian Region 


Mr. C. A. P. Boundy, C.S.I.R.O. Private Bag, Nedlands, W. Australia. 

Dr. B. Diamantis, 2 Raleigh Street, Windsor S. 1, Victoria, Australia. 

Dr. G. Gregory, Department of Statistics, University of Melbourne, 
Melbourne, Victoria, Australia. 

Mr. Terence P. O’Brien, 24 Ardmillan Road, Moowee Ponds, Victoria, 
Australia. 

Dr. P. N. R. Sutton, 24 Wellington Street, Brighton S. 5, Victoria, 
Australia. 

Mr. M. Ulehla, ICIANZ Research Laboratories, Newson Street, Ascot 
Vale, W. 2, Victoria, Australia. 


Eastern North American Region 


Dr. John I. Alman, Boston University, 725 Commonwealth Avenue, 
Boston 15, Massachusetts, U.S.A. 
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Dn. E. 'T. Angelakos, Boston University School of Medicine, 80 Kast 
Concord Street, Boston 18, Massachusetts, U.S.A. 

Mr. Joe N. Boyd, Biometrical Services, Plant Industry Station, Belts- 
ville, Maryland, U.S.A. 

Mr. Seymour Bush, 3402 Hillsboro Street, Raleigh, North Carolina, 
U.S.A. 

Dr. Osmer Carpenter, The Upjohn Company, 301 Henrietta Street, 
Kalamazoo, Michigan, U.S.A. 

Mr. Frederick L. Carter, Jr., Rt. Cambria, Virginia, U.S.A. 

Mr. Richard A. Damon, Jr., Biometrical Services, ARS, Agricultural 
Research Center, Beltsville, Maryland, U.S.A. 

Mr. L. Lee Eberhardt, Game Division, Michigan Dept. of Conservation, 
Lansing 26, Michigan, U.S.A. 

Dr. Khalil M. El-Kashlan, School of Public Health, 600 W. 168th 
Street, New York 32, N. Y., U.S.A. 

Dr. Dale O. Everson, Biometrical Services, ARS, Agricultural Research 
Center, Beltsville, Maryland, U.S.A. 

Miss Susan J. Gordon, Lemuel Shattuck Hospital, 170 Morton Street, 
Jamaica Plain, Massachusetts, U.S.A. 

Dr. R. K. Haddad, Bureau of Research, c/o NJNPI, Box 1000, Prince- 
ton, New Jersey, U.S.A. 

Mr. John R. Hatfield, Eli Lilly and Company, Indianapolis 6, Indiana, 
US.A. 

Miss Emmarie C. Hemphill, Division of General Medical Sciences, 
National Institutes of Health, Bethesda 14, Maryland, U.S.A. 

Dr. Lonnie L. Lasman, Department of Statistics, Florida State Uni- 
versity, Tallahassee, Florida, U.S.A. 

Mr. Aden C. Magee, Animal Husbandry Department, North Carolina 
State College, Raleigh, North Carolina, U.S.A. 

Mr. James H. Meade, Jr., Department of Animal Husbandry, University 
of Florida, Gainesville, Florida, U.S.A. 

Mr. Lester W. Preston, Jr., 2933 Claremont Road, Raleigh, North 
Carolina, U.S.A. 

Prof. David Rosenblatt, Mathematics and Statistics, American Uni- 
versity, Washington 6, D. C., U.S.A. 


* Dr. Norman C. Severo, Department of Statistics, University of Buffalo, 


Buflalo 14, New York, U.S.A. 

Dr. Marjorie L. Sutherland, R. R. 3, Box 258, Greenwood, Indiana, 
US.A. 

Mrs. Marguerite T. Wood, 900 Quincy Street, N. W., Washington 11, 
D. C., U.S.A. 
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bre neh fon, 


M. Philippe CGounot, 29 Quai Branly, Paris, France. 

M. Philippe Lazar, 8 rue Lentonnet, Paris Ye, France. 

M. Pierre Lossois, 120 bis, avenue de Verdun, Issy-les-Moulineaux, 
(Seine) France. 

M. Philippe Merat, Centre National de Recherches Zootechniques, 
Jouy-en-Josas, (S et 0) France. 

German Region 

Dipl. Math. Georg Bambynek, Mulheimer Str. 72, Leverkusen, Ger- 
many. 

Dr. Heinz Bekemeier, Leninstr. 22a, Halle/Saale, Germany. 

Dipl. Land. K. Bellmann, Am Sportplatz 16, Gross-Lusewitz kreis 
Rostock, Germany. 

Mrs. Hannelore Beyer, Gesehwister-Scholl-Str. 47, Miltitz bei Leipzig, 
Germany. 

Dr. O. Ewert, Psychol. Inst. d. Univ., Mainz, Germany. 

Dr. Friedrich Jochum, Kerpener Str. 1. -15, Koln-Lindenthal, Germany. 

Dr. med. Herbert Lippert, Pettenkoferstrasse 11, Munchen 15, Germany. 

Dr. Lucie Osadnik, lerdinand-Rhode-Str. 10, Leipzig C 1, Germany. 

Prof. Dr. M. Prodan, Abt. fur Forthiche Biometrie, Wallstr. 22, Frei- 
burg i. Br., Germany. 

Dr. Wilhelm Schafer, Darmstadter Str. 284, Bensheim-Auerbach/ 
Bergstr., Germany. 

Dr. med. Ranier Thierbach, Leninstrasse 20, Halle (Saale), Germany. 

Netherlands 


Ir. A. R. Bloemena, Mathematisch Centrum, Amsterdam, Netherlands. 

Mr. J. Borst, arts. Moreelsestraat 4, Amsterdam (z) Netherlands. 

Mr. H. de Jonge, Institut voor Preventieve genceskunde, Leiden, 
Netherlands. 

Ir. L. de Rijke, Bibliotheek Ministerie van Landbouw, den Haag, 
Netherlands. 

Miss A. Verbeek, ¢/o Mathem. Centrum, 2¢ Boerhavestraat 49, Amster- 
diam, Netherlands. 


Sweden 


Dr. Claus Rerup, Department of Pharmacology, The Royal University, 

Lund, Sweden. 

Dr. R. A. Reyment, Kungstonsgatan 45, Geologiska Institutet, Stock- 
holms Hogskola, Sweden. 
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Western North American Region 


Mr. Walter E. Cole, Boise Research Center, 316 East Myrtle Street, 
Boise, Idaho, U.S.A. 

Dr. Harvey F. Dingman, P. O. Box 100, Pomona, California, U.S.A. 

Mr. P. L. Northcott, Forest Products Laboratory, 6620 Northwest 
Marine Drive, Vancouver, B. C., Canada. 
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NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secre- 
tary (if members at large, to the General Secretary) news of appointments, 
distinctions, or retirements, and announcements of professional interest. 


TRAINEESHIPS FOR PUBLIC HEALTH STAFISTICIANS 


The Public Health Service has announced the availability of trainee- 
ships for graduate training of professional public health personnel 
during the 1960-1961 academic year. 

Traineeships in public health statistics are available to qualified 
persons. ‘They provide stipends from a minimum of $250 per monti 
for a post-bachelor candidate to a maximum of $400 per month for a 
post-doctoral candidate and additional allowances for dependents, 
travel of the trainee, and academic tuition and fees. 

Additional information and application forms may be secured trom 
the Division of General Health Services, Public Health Service, U. 8S. 
Department of Health, Mducation, and Welfare, Washington 25, D. C. 


FIRST INTERNATIONAL CONGRESS OF UIST OCHEMISTRY 
AND CYTOCHEMISTRY 


Paris, August 28-September 3, 1960. 


In 1960, from August 28 to September 3, the lirst internationa! 
Congress of Histochemistry and Cytochemistry will be held in Paris. 
It is organized under the auspices of the Société Francaise d’ Histochimie 
im collaboration with the histochemical! societies in existence all over 
the world, especially the American Histochemical Society, the Deutsci: 
Arbeitsgemeinschaft fiir Histochemie, the Société Belge d’Histochinue, 
the Italian and Japanese histochemical socieiies several non- 
autonomous sections of histochemistry. 


All correspondence on scientific Congress matters should be addressed to: 


Wremann, Institut d’Histochimie Médieale, 45, rue des Saints- 
Peres, Paris (6°), Vranee. 


The full program will be send on request. 
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NEWS ABOUT MEMBERS 
ENAR 


Glenn E. Bartsch has taken the position of Assistant Professor of 
Biostatistics at Western Reserve University after completing a U. S. 
Public Health Service Postdoctoral Research Fellowship at the London 
School of Hygiene and Tropical Medicine. 

G. E. P. Box, formerly Director of Statistical Techniques Research 
Group, Princeton University, has been appointed Head of the new 
Department of Statistics at the University of Wisconsin. 

Victor Chew, formerly an Assistant Statistician in the Institute of 
Statistics at Raleigh, North Carolina, is now employed as a Mathema- 
tical Statistician in the Computation and Analysis Laboratory of the 
U.S. Naval Weapons Laboratory, Dahlgren, Virginia. 

Bruce A. Drew is presently Statistician for the Pillsbury Company 
in New Albany, Indiana. He was formerly Senior Chemist, Hercules 
Powder Company, Hopewell, Virginia. 

Magdi El-Kammash is Research Associate in the Department of 
Sociology at Duke University, Durham, North Carolina. 

John Gurland, Professor of Statistics at Iowa State University, 
has been awarded a travel grant by the Committee on International 
Conference Travel Grants of the American Statistical Association, to 
attend the Biometric Society Symposium on Quantitative Methods in 
Pharmacology at the University of Leyden, Netherlands, May 10-13. 
Dr. Gurland will act as the official representative of the American 
Statistical Association at this Symposium. 

F. M. Hemphill has taken the post of Chief of Design and Analysis 
Section, Division of Research Grants, National Institutes of Health. 
She was formerly Professor in the School of Public Health, University 
of Michigan. 

Donald K. Hotchkiss has recently accepted the post as Manager, 
Research Statistics Department at the Ralston Purina Company. 

S. K. Katti completed requirements for the Ph.D. degree in statistics 
at the Iowa State University in January 1960. He has joined the 
staff of the new Department of Statistics at the Florida State Uni- 
versity, Tallahassee, Florida, as Assistant Professor of Stutistics. 

Richard P. Lehimaua is a Graduate Research Assistant in Animal 
Breeding at the N. C. State College Agricultural Experiment Station, 
Raleigh, North Carolina. ; 

Herbert L. Lombard, internationally known specialist in cancer 
control, and until his recent retirement, Director of the Division of 
Cancer and Chronic Diseases, Massachusetts Department of Public 
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Health, was officially cited by the Commonwealth and the medical 
profession for his many outstanding contributions in the fight against 
eaueer. Governor Foster Furcolo made the citation presentation 
January 20 at the State Tlouse at a special ceremony attended by 
distinguished leaders in medicine and public health. 

Stanley W. Nash of the University of British Columbia has been 
appointed as Visiting Associate Professor at the Statistical Laboratory, 
Iowa State University for a period of one year beginning July 1, 1960. 

Mary Ellen Patno, formerly Assistant Professor of Biostatistics 
at the University of Pittsburgh, is now Associate Professor of Public 
Health Statistics at the University of Michigan. 

Charles E. Redman, Jr., is presently a Senior Biometrician, Statis- 
tical Research Department, Eli Lilly and Company. 

Robert R. Shrode is employed as a Geneticist at the Wm. H. Miner 
Agricultural Research Institute in Chazy, New York. He was formerly 
Professor of Genetics at the A. and M. College of Texas. 

Donald I’. Starr has left his position as Research Director of the 
V. D. Anderson Company in Cleveland, Ohio, and is now working as 
a consulting chemist in Grand Island, Nebraska. 

Karl A. Thomas is presently a staff member of the Institute for 
Defense Analysis. He was formerly a Technical Advisor at the Bal- 
listic Missiles Division, Burrough’s Research Center, Paoli, Pennsyl- 
vania. 

John W. Tukey has been appointed to membership on President 
isenhower’s Science Advisory Committee for the four-year term 
beginning January 1, 1960. 

Vernon EK. Weckwerth, formerly Head of Research and Statistics 
of the American Hospital Association and Assistant Director of the 
Ilospital Research and Education Trust in Chicago, Illinois, is now a 
Lecturer and Summer. Session Administrative Director at the Uni- 
versity of Minnesota. 

Roberta A. Wilcox, formerly Senior Biostatistician with the Depart- 
ment of Health, New York State, has taken the position of Research 
Statistician with the Lederle Laboratories in Pearl River, New York. 
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A Multiplicative Model for Analyzing Variances which are Affected by Several 


Estimation in the Truneated Poisson Distribution when Zeros and Some 
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Test of the Hypothesis that a Linear Regression System: Oheys Two Separate 
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TECHNOMETRICS 


A Journal of Statistics for ihe 
Physical, Chemical, and Engineering Sciences 


Vol. 2, No. 1 February 1960 


CONTENTS 
Some Remarks on Wild Observations .................... W. H. Krousxau 
Statistical Estimation of the Gasoline Octane Number Requirement of New 
Model Automobiles. ................ C.S. Brineaar anp R. R. 
The Effect of Sequential Batching for Acceptance—Rejection Sampling 
Upon Sample Assurance of Total Product Quality 
M. AND G. L. Burrows 
Elements of the Theory of Extreme Values.................... B. Epste1n 
System Efficiency and Reliability. ....... R. E. Bartow anp L. C. HunTER 
Aids for Fitting the Gamma Distribution by Maximum Likelihood 
J. A. GREENWOop AND D. Duranp 
Experimental Designs to Adjust for Time Trends......... Husest M. Hiwu 
Tests for the Validity of the Assumption that the Underlying Distribution of 
Programming Fisher’s Exact Method of Comparing Two Percentages 
W. H. Rosertson 
Misclassified Data from a Binominal Population....A. CLirForD CoHEN, JR. 


Vol. 2, No. 2 May 1960 
CONTENTS 

Locating Outliers in Factorial Experiments..................... C. DANIEL 


Discussion of the Papers of Messrs. Anscombe and Daniel 

W. H. Krusxat, T. S. Ferauson, J. W. Tukey anp E. J. GuMBEL 
Tests for the Validity of the Assumptions that the Underlying Distribution of 
The Partial Duplication of Response Surface Designs.......... O. DyxsTRa 
A Rank Sum Test for Comparing all Pairs of Treatments....R. G. D. STEEL 

The Percentile Points of Distributions Having Known Cumulants 
R. A. Fisoer anp E. A. Cornis# 
An Approximation to the Negative Moments of the Positive Binomial Useful 


Order Statistics from the Gamma Distribution................ 8. 8. Gupra 


Technometrics is published quarterly in February, May, August, and 
November. The annual non-member subscription rate is $8.00. To members 
of the American Statistical Association and the American Society for Quality 
Control the rate is $6.00. Checks should be made payable to Technometrics 
and addressed to T'echnometrics, Post Office Box 587, Benjamin Franklin 
Station, Washington 6, D. C. 
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BIOMETRIE—PRAXIMETRIE 


Tome I, N° 1 January, 1960 


TABLE OF CONTENTS 


La probabilité mathématique dans les sciences naturelles. .Sir R. A. FisHer 
Traduction: L. Martin 


Relation entre l’incidence des Antestiopsis et les dégats sur Caféier 
Arabica G. Foucart 


Ajustement d’un faisceau de svyst¢mes de polynémes orthogonaux. Applica- 
tion au développement de l’index histaminolytique au cours de la 


La vie de la Société 
Personalia 
Troisiéme Journée Biométrique (Gembloux, 26 juillet 1960) 


Tome I, N° 2 April, 1960 


TABLE OF CONTENTS 


Les levures sélectionées en cidrerie—Variations de certains de leurs carac- 
téres selon les espéces de pommes utilisées ..............006 Y. GRAFF 


Analyse de données non orthogonales dans le cas d’un expérience 4 deux 
facteurs 


Etude biométrique de la production en cdnes et en graines du pin de 
Koekelare—Critéres de sélection ANNE LENGER ET P. GatTHY 


Blocs randomisés et interactions M. Datesroux 


Publishers for the 


SOCIETE ADOLPHE QUETELET 
7 rue Héger-Bordet, Brussels 1, Belgium 


Presses Universitaires de Bruxelles 
50, av. Franklin Roosevelt 
Brussels 
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INFORMATION FOR CONTRIBUTORS 


MANUSCRIPTS 


Contributions for Biometrics may be addressed to Dr. Ralph A. Bradley, Depart- 
ment of Statistics, The Florida State University, Tallahassee, Florida, U.S.A.; 
authors residing in the following Society Regions can expedite consideration of papers 
by submitting them to the appropriate Associate Editor, namely; BRITISH RE- 
GION: Dr. 8. C. Pearce, East Malling Research Station, East Malling, Maidstone, © 
Kent, England; AUSTRALASIAN REGION: Dr. E. A. Cornish, University of 
Adelaide, Adelaide, Australia; FRENCH REGION: Dr. Georges Teissier, Faculté 
des Sciences de Paris, 1 rue V. Cousin, Paris, France. QUERIES, NOTES, and 
related correspondence should be directed to Dr. D. J. Finney, Department of 
Statistics, University of Aberdeen, Meston Walk, Old Aberdeen, Scotland. Books 
and material for Book Reviews should be sent to Mr. J. G. Skellam, The Nature 
Conservancy, 19 Belgrave Square, London, 8.W. 1, England. 

MANUSCRIPTS must be submitted in triplicate, with typescript doublespaced 
throughout. Marginal notes may obviate typographical difficulties presented by 
complicated formulae or tables—authors should not attempt editorial instructions 
or markings for the printer. TABLES should be identified by arabic number and 
by a short descriptive title. ILLUSTRATIONS should also be identified by arabic 
number and by a brief caption. (Captions should not be included in illustrations, 
but should be typewritten collectively on an accompanying sheet.) Originals 
should be approximately 8.5 x 11 in. (21.5 x 28 cm.). The original of each chart, 
diagram, or graph should be executed in black on white drawing paper or board, on 
blue tracing linen, or on coordinate paper ruled in blue only; coordinate lines to be 
reproduced should be ruled in black. For printing, illustrations may be reduced to 
¥ or \ original dimensions. Lines should therefore be of sufficient thickness, and 
decimal points, periods, and stippled dots should be solid black circles large enough 
to reproduce well. Lettering and numerals should be at least 1 mm. high when 
reproduced in a cut 3 in. (7.5 cm.) wide. Photographs should be prints on glossy 
paper with strong contrasts, and if grouped in a plate should be mounted contig- 
uously. All tables and illustrations should be mentioned explicitly in the text. 
REFERENCES (BIBLIOGRAPHIC) should be collectively listed alphabetically 
by author; textual citation by author and year is preferred. 


ABSTRACTS 
Abstracts of papers presented at meetings of the Biometric Society or of its 
regions are printed in Biometrics following such meetings. They should be submitted 
to the person designated to receive them for a particular meeting in exactly the form 
published in Biometrics (except for an Abstract Number), doublespaced on bond 
paper, and in duplicate. Use of formulae requiring display printing is to be avoided. 
Notices, ANNOUNCEMENTS, AND Biometric Society Reports 
International and regional reports and notices should be submitted by the 
appropriate officers of the Society and its Regions in duplicate doublespaced on 
separate sheets exactly as they are to be printed in Biometrics. Other material to 


be printed in News and Announcements should also be submitted doublespaced 
and in duplicate. 


SusTaInInGc MEMBERS OF THE Society 
Abbott Laboraiories 
American Cancer Society, Inc. 
General Foods Corporation, Research Center 
Heisdorf and Nelson Farms, Inc. 
Merck, Sharp and Dohme Research Laboratories 
Schering Corporation 
Smith, Kline and French Laboratories 
E. R. Squibb and Sons 
The Upjohn Company 
Wallace Laboratories, Division of Carter Products 
Wyeth Institute of Applied Biochemistry 
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BACK ISSUES 


Back issues of Biometrivs are available at the following postage-paid 
prices in U.S.A. currency: 


Price per Price per 
Year Volume Number Single Number Voiume (unbound) 
1945 1 1to6 $1.00 $6.00 
1946 2 1 to 6 1.00 6.00 
1947 3 1 to4 1.50 5.00 
1948 4 1lto4 1.50 5.00 
1949 5 lto4 1.50 5.00 
1950 6 1 to4 1.50 5.00 
1951 7 lto4 2.00 8.00 
1952 8 lto4 2.00 8.00 
1953 9 lto4 2.00 8.00 
1954 10 1 to4 2.00 8.00 
1955 11 lto4 2.00 8.00 
1956 " 12 lto4 2.00 8.00 
1957 13 1to4 2.00 8.00 
1958 14 1 to4 2.00 8.00 
1959 15 1to4 2.00 8.00 


Reprints of individual articles are not available except to authors at the 
time of printing. Three special issues are among the numbers listed 
above. They are: 


1947 Volume 3 Number 1 The Analysis of Variance 


1951 Volume 7 Number 1 Components of Variance 

1957 Volume 13 Number 3 The Analysis of Covariance 
Also available are: 

Fishery Reprint Series (Selected reprints from Vol. 5) $1.00 

Subject Index (Volumes 1-10) 1.00 

Proceedings, International Biometric Symposium, 

Campinas, Brazil, 1955. 1.00 


Inquiries, non-member subscriptions, and orders for back issues and 
other material listed above should be addressed to: Biomerrics, Depart- 
MENT OF Statistics, THE FLoripa Stare University, TALLAUASSEE, 
Froripa, U.S.A. 
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